On Wed, Jul 31, 2013 at 4:56 AM, Bill Bell billnb...@gmail.com wrote:
On Jul 30, 2013, at 12:34 PM, Dotan Cohen dotanco...@gmail.com wrote:
On Tue, Jul 30, 2013 at 9:21 PM, Aloke Ghoshal alghos...@gmail.com wrote:
Does adding facet.mincount=2 help?
In fact, when adding facet.mincount=20 (I
fwiw,
this code won't capture uncommitted duplicates.
On Wed, Jul 31, 2013 at 9:41 AM, Dotan Cohen dotanco...@gmail.com wrote:
On Tue, Jul 30, 2013 at 11:14 PM, Jack Krupansky
j...@basetechnology.com wrote:
The Solr SignatureUpdateProcessorFactory is designed to facilitate
dedupe...
any
Good to note!
But... any search will not detect dupe IDs for uncommitted documents.
-- Jack Krupansky
-Original Message-
From: Mikhail Khludnev
Sent: Wednesday, July 31, 2013 6:11 AM
To: solr-user
Subject: Re: How might one search for dupe IDs other than faceting on the ID
field
Does adding facet.mincount=2 help?
On Tue, Jul 30, 2013 at 11:46 PM, Dotan Cohen dotanco...@gmail.com wrote:
To search for duplicate IDs, I am running the following query:
select?q=*:*facet=truefacet.field=idrows=0
However, since upgrading from Solr 4.1 to Solr 4.3 I am receiving
On 7/30/2013 12:16 PM, Dotan Cohen wrote:
To search for duplicate IDs, I am running the following query:
select?q=*:*facet=truefacet.field=idrows=0
However, since upgrading from Solr 4.1 to Solr 4.3 I am receiving
OutOfMemoryError errors instead of the desired facet:
snip
Might there be a
Are you talking about the document's ID field?
If so, you can't have duplicates... the latter document would overwrite the
earlier.
If not, sorry for asking irrelevant questions. :)
Michael Della Bitta
Applications Developer
o: +1 646 532 3062 | c: +1 917 477 7906
appinions inc.
“The
On Tue, Jul 30, 2013 at 9:21 PM, Aloke Ghoshal alghos...@gmail.com wrote:
Does adding facet.mincount=2 help?
In fact, when adding facet.mincount=20 (I know that some dupes are in
the hundreds) I got the OutOfMemoryError in seconds instead of
minutes.
--
Dotan Cohen
http://gibberish.co.il
On Tue, Jul 30, 2013 at 9:23 PM, Michael Della Bitta
michael.della.bi...@appinions.com wrote:
Are you talking about the document's ID field?
If so, you can't have duplicates... the latter document would overwrite the
earlier.
If not, sorry for asking irrelevant questions. :)
In Solr 4.1 we
Since this is a one-time problem, Have you thought of just dumping all the
IDs and looking for dupes using sort and awk or something similar to that?
Michael Della Bitta
Applications Developer
o: +1 646 532 3062 | c: +1 917 477 7906
appinions inc.
“The Science of Influence Marketing”
18
On Tue, Jul 30, 2013 at 9:43 PM, Michael Della Bitta
michael.della.bi...@appinions.com wrote:
Since this is a one-time problem, Have you thought of just dumping all the
IDs and looking for dupes using sort and awk or something similar to that?
All 100,000,000 of them :) That would take even
On 7/30/2013 12:49 PM, Dotan Cohen wrote:
Thanks, the query ran for almost 2 full minutes but it returned
results! I'll google for how to increase the disk cache for queries
like this. Other than the Qtime, is there no way to judge the amount
of memory required for a particular query to run?
Dotan,
Could you please provide more line of the stack trace?
I have no idea why it made worse at 4.3. I know that 4.3 can use facets
backed on DocValues, which are modest for the heap. But from what I saw,
but can be wrong it's disabled from numeric facets. Hence, I can suggest to
reindex id as
Message-
From: Jack Krupansky
Sent: Tuesday, July 30, 2013 4:14 PM
To: solr-user@lucene.apache.org
Subject: Re: How might one search for dupe IDs other than faceting on the ID
field?
The Solr SignatureUpdateProcessorFactory is designed to facilitate dedupe...
any particular reason you did
This seems like a fairly large issue. Can you create a Jira issue ?
Bill Bell
Sent from mobile
On Jul 30, 2013, at 12:34 PM, Dotan Cohen dotanco...@gmail.com wrote:
On Tue, Jul 30, 2013 at 9:21 PM, Aloke Ghoshal alghos...@gmail.com wrote:
Does adding facet.mincount=2 help?
In fact,
On Tue, Jul 30, 2013 at 9:56 PM, Shawn Heisey s...@elyograg.org wrote:
On 7/30/2013 12:49 PM, Dotan Cohen wrote:
Thanks, the query ran for almost 2 full minutes but it returned
results! I'll google for how to increase the disk cache for queries
like this. Other than the Qtime, is there no
On Tue, Jul 30, 2013 at 11:00 PM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
Dotan,
Could you please provide more line of the stack trace?
Sure, thanks:
responselst name=errorstr
name=msgjava.lang.OutOfMemoryError: Java heap space/strstr
name=tracejava.lang.RuntimeException:
On Tue, Jul 30, 2013 at 11:14 PM, Jack Krupansky
j...@basetechnology.com wrote:
The Solr SignatureUpdateProcessorFactory is designed to facilitate dedupe...
any particular reason you did not use it?
See:
http://wiki.apache.org/solr/Deduplication
and
17 matches
Mail list logo