Hello Steven!
Steven Rowe (JIRA) schrieb:
>
> Also, I don't see Swedish among the hyphenation data licenses - is it covered
> in some other way?
>
I have a Swedish grammar file now. If you are interested drop me a note.
It is not that hard to generate them from the TeX files.
CU
Thomas
-
[
https://issues.apache.org/jira/browse/LUCENE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566447#action_12566447
]
Steven Rowe commented on LUCENE-1157:
-
Okay - it's available now at:
[http://hudson.z
That is the problem, waiting for the full sync (of all of the segment
files) takes quite a while... syncing a single log file is much more
efficient.
On Feb 6, 2008, at 9:41 PM, Andrew Zhang wrote:
On Feb 7, 2008 7:22 AM, robert engels <[EMAIL PROTECTED]> wrote:
That doesn't help, with la
On Feb 7, 2008 7:22 AM, robert engels <[EMAIL PROTECTED]> wrote:
> That doesn't help, with lazy writing/buffering by the OS, there is no
> guarantee that if the last written block is ok, that earlier blocks
> in the file are
>
> The OS/drive is going to physically write them in the most effici
On Feb 6, 2008, at 6:42 PM, Mark Miller wrote:
Hey DM,
Just to recap an earlier thread, you need the sync and you need
hardware that doesn't lie to you about the result of the sync.
Here is an excerpt about Digg running into that issue:
"They had problems with their storage system telling
I'm pretty sure that what you describe is the case, specially taking into
consideration that PageRank (what drives their search results) is a per
document value that is probably recomputed after some long time interval. I
did see a MapReduce algorithm to compute PageRank as well. However I do
think
(trimming excessive cc-s)
Ning Li wrote:
No. I'm curious too. :)
On Feb 6, 2008 11:44 AM, J. Delgado <[EMAIL PROTECTED]> wrote:
I assume that Google also has distributed index over their
GFS/MapReduce implementation. Any idea how they achieve this?
I'm pretty sure that MapReduce/GFS/BigTabl
One main focus is to provide fault-tolerance in this distributed index
system. Correct me if I'm wrong, I think SOLR-303 is focusing on merging
results from multiple shards right now. We'd like to start an open source
project for a fault-tolerant distributed index system (or join if one
already exi
[
https://issues.apache.org/jira/browse/LUCENE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566406#action_12566406
]
Nigel Daley commented on LUCENE-1157:
-
{quote}
job/Lucene-trunk/ws/ sounds like a temp
No. I'm curious too. :)
On Feb 6, 2008 11:44 AM, J. Delgado <[EMAIL PROTECTED]> wrote:
> I assume that Google also has distributed index over their
> GFS/MapReduce implementation. Any idea how they achieve this?
>
> J.D.
>
I work for IBM Research. I read the Rackspace article. Rackspace's Mailtrust
has a similar design. Happy to see an existing application on such a system.
Do they plan to open-source it? Is the AOL project an open source project?
On Feb 6, 2008 11:33 AM, Clay Webster <[EMAIL PROTECTED]> wrote:
>
>
Hey DM,
Just to recap an earlier thread, you need the sync and you need hardware
that doesn't lie to you about the result of the sync.
Here is an excerpt about Digg running into that issue:
"They had problems with their storage system telling them writes were on
disk when they really weren't
That doesn't help, with lazy writing/buffering by the OS, there is no
guarantee that if the last written block is ok, that earlier blocks
in the file are
The OS/drive is going to physically write them in the most efficient
manner. Only after a sync would this hold true (which is what we
Yes, but this pruning could be more efficient. On a background
thread, get current segment from segments file, call the system wide
sync ( e.g. System.exec("fsync"), then you can purge the transaction
logs for all segments up to that one. Since it is a background
operation, you are not bloc
On Feb 6, 2008, at 5:42 PM, Michael McCandless wrote:
robert engels wrote:
Do we have any way of determining if a segment is definitely OK/
VALID ?
The only way I know is the CheckIndex tool, and it's rather slow (and
it's not clear that it always catches all corruption).
Just a thought.
robert engels wrote:
Do we have any way of determining if a segment is definitely OK/
VALID ?
The only way I know is the CheckIndex tool, and it's rather slow (and
it's not clear that it always catches all corruption).
If so, a much more efficient transactional system could be developed.
S
add compatibility statement to README.txt for all contribs
--
Key: LUCENE-1167
URL: https://issues.apache.org/jira/browse/LUCENE-1167
Project: Lucene - Java
Issue Type: Task
C
Clay Webster wrote:
There seem to be a few other players in this space too.
Are you from Rackspace?
(http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-
query-terabytes-data)
AOL also has a Hadoop/Solr project going on.
CNET does not have much brewing there. Although Yo
Oh well, I ticked the "remove trailing white space" box.
The only real addition is at the end:
>* Easier and more efficient ways to add proximity scoring?
> +For example specialize Span-Near-Query for the case when all subqueries
> are terms.
Regards,
Paul Elschot
On Thu, Jan 31, 2008 at 11:09 AM, Doron Cohen <[EMAIL PROTECTED]> wrote:
> Hi Otis,
>
> On Thu, Jan 31, 2008 at 7:21 AM, Otis Gospodnetic <
> [EMAIL PROTECTED]> wrote:
>
> > Doron - this looks super useful!
> > Can you give an example for the lexical affinities you mention here?
> > ("Juru creates
Hi Grant, yes I have these combinations - I just updated the wiki page with
these numbers.
I still have the index as described,allowing to try other ideas that may
come up, or if we need more tests (on GOV2 data) to take better decisions
...
Cheers, Doron
On Wed, Feb 6, 2008 at 2:15 PM, Grant In
I assume that Google also has distributed index over their
GFS/MapReduce implementation. Any idea how they achieve this?
J.D.
On Feb 6, 2008 11:33 AM, Clay Webster <[EMAIL PROTECTED]> wrote:
>
> There seem to be a few other players in this space too.
>
> Are you from Rackspace?
> (http://highsc
There have been several proposals for a Lucene-based distributed index
architecture.
1) Doug Cutting's "Index Server Project Proposal" at
http://www.mail-archive.com/[EMAIL PROTECTED]/msg00338.html
2) Solr's "Distributed Search" at
http://wiki.apache.org/solr/DistributedSearch
3) Mark Bu
[
https://issues.apache.org/jira/browse/LUCENE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566257#action_12566257
]
Doron Cohen commented on LUCENE-1157:
-
Nice spying work Steven :)
I am not familiar w
[
https://issues.apache.org/jira/browse/LUCENE-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566220#action_12566220
]
Thomas Peuss commented on LUCENE-1166:
--
bq. Looking at http://offo.sourceforge.net/hy
[
https://issues.apache.org/jira/browse/LUCENE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566206#action_12566206
]
Nigel Daley commented on LUCENE-1157:
-
I suggest you save the Changes.html as one of t
[
https://issues.apache.org/jira/browse/LUCENE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566198#action_12566198
]
Steven Rowe commented on LUCENE-1157:
-
Wait! I found it:
[http://hudson.zones.apache
[
https://issues.apache.org/jira/browse/LUCENE-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566188#action_12566188
]
Steven Rowe commented on LUCENE-1166:
-
Hi Thomas,
Looking at [http://offo.sourceforge
[
https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doron Cohen updated LUCENE-997:
---
Attachment: timeout.patch
> Add search timeout support to Lucene
> --
[
https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doron Cohen updated LUCENE-997:
---
Attachment: timeout.patch
Attached patch corrects default resolution comment.
> Add search timeout s
[
https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566175#action_12566175
]
Doron Cohen commented on LUCENE-997:
Oh wrote comment that was before I decided to chan
[
https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566171#action_12566171
]
Sean Timm commented on LUCENE-997:
--
Doron, your comment for setResolution(long) says "The
[
https://issues.apache.org/jira/browse/LUCENE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566167#action_12566167
]
Doron Cohen commented on LUCENE-1157:
-
I suspected something like this but wasn't sure
[
https://issues.apache.org/jira/browse/LUCENE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566163#action_12566163
]
Steven Rowe commented on LUCENE-1157:
-
If I browse to
[http://hudson.zones.apache.org
[
https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doron Cohen updated LUCENE-997:
---
Attachment: timeout.patch
Sean thanks for adding the test.
In the attached I tightened the check of
Hey Doron,
I see you recommend that we think about making SweetSpot the default
similarity. Do you have numbers showing for running that alone? Or
for that matter, any of the other combinations of #3 individually?
Thanks,
Grant
On Jan 31, 2008, at 4:09 AM, Doron Cohen wrote:
Hi Otis,
[
https://issues.apache.org/jira/browse/LUCENE-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas Peuss updated LUCENE-1166:
-
Attachment: hyphenation.dtd
The DTD describing the hyphenation grammar XML files.
> A tokenfilt
[
https://issues.apache.org/jira/browse/LUCENE-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas Peuss updated LUCENE-1166:
-
Attachment: de.xml
A hyphenation grammar. You can download them from:
http://downloads.sourcefo
A tokenfilter to decompose compound words
-
Key: LUCENE-1166
URL: https://issues.apache.org/jira/browse/LUCENE-1166
Project: Lucene - Java
Issue Type: New Feature
Components: Analysis
[
https://issues.apache.org/jira/browse/LUCENE-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas Peuss updated LUCENE-1166:
-
Attachment: CompoundTokenFilter.patch
A preliminary version of the token filter.
> A tokenfilte
40 matches
Mail list logo