[Lucene.Net] Adding to an existing index is slow

2011-03-15 Thread Khoa Vo
I am trying to add a new document to an existing  optimized Index with about
600k documents.

It seems like the IndexWriter method AddDocument(doc) is taking a long time
(30 seconds or so)

Is this the usual behavior?

Note: Initially indexing all 600 documents only takes about 20 minutes.

It seems like something is blocking, but I am not sure.

Any insight would be helpful

-Khoa


[jira] Commented: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006805#comment-13006805
 ] 

Bill Bell commented on SOLR-2242:
-

OK this is complete.

Sample query:

http://localhost:8983/solr/select?q=*:*facet=truefacet.field=catrows=0facet.numfacetterms=2facet.limit=4

Sample output:
{code}
  ?xml version=1.0 encoding=UTF-8 ? 
- response
- lst name=responseHeader
  int name=status0/int 
  int name=QTime0/int 
- lst name=params
  str name=facet.numfacetterms2/str 
  str name=facettrue/str 
  str name=q*:*/str 
  str name=facet.limit4/str 
  str name=facet.fieldcat/str 
  str name=rows0/str 
  /lst
  /lst
  result name=response numFound=17 start=0 / 
- lst name=facet_counts
  lst name=facet_queries / 
- lst name=facet_fields
- lst name=cat
  int name=numFacetTerms14/int 
- lst name=counts
  int name=electronics14/int 
  int name=memory3/int 
  int name=connector2/int 
  int name=graphics card2/int 
  /lst
  /lst
  /lst
  lst name=facet_dates / 
  lst name=facet_ranges / 
  /lst
  /response
{code}

In Json:

{code}
{responseHeader:{status:0,QTime:0,params:{facet.numfacetterms:2,facet:true,q:*:*,facet.limit:4,facet.field:cat,wt:json,rows:0}},response:{numFound:17,start:0,docs:[]},facet_counts:{facet_queries:{},facet_fields:{cat:[numFacetTerms,14,counts,[electronics,14,memory,3,connector,2,graphics
 card,2]]},facet_dates:{},facet_ranges:{}}}

{code}

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-distinctFacet.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2242:


Attachment: SOLR-2242.v2.patch

v2 of the release based on feedback.

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-distinctFacet.patch, SOLR-2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006806#comment-13006806
 ] 

Bill Bell edited comment on SOLR-2242 at 3/15/11 6:06 AM:
--

v2 of the release based on feedback.

Note: SOLR-2242-distinctFacet.patch not needed (left for history)

  was (Author: billnbell):
v2 of the release based on feedback.
  
 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-distinctFacet.patch, SOLR-2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006779#comment-13006779
 ] 

Bill Bell edited comment on SOLR-2242 at 3/15/11 6:06 AM:
--

No actually namedistinct is not the number of values. It is the number of names.

{code}
- lst name=facet_fields
- lst name=hgid
   int name=HGPY045FD36D4000A1/int
   int name=HGPY0FBC6690453A91/int
   int name=HGPY1E44ED6C4FB3B1/int
   int name=HGPY1FA631034A1B81/int
   int name=HGPY3317ABAC43B481/int
   int name=HGPY3A17B2294CB5A5/int
   int name=HGPY3ADD2B3D48C391/int
   /lst
   /lst
{code}

Becomes:

{code}
lst name=facet_fields
  lst name=hgid
   int name=namedistinct7/int  !-- this is not 11 --
   lst name=counts
int name=HGPY045FD36D4000A1/int
int name=HGPY0FBC6690453A91/int
int name=HGPY1E44ED6C4FB3B1/int
int name=HGPY1FA631034A1B81/int
int name=HGPY3317ABAC43B481/int
int name=HGPY3A17B2294CB5A5/int
int name=HGPY3ADD2B3D48C391/int
   /lst
  /lst
/lst
{code}


  was (Author: billnbell):
No actually namedistinct is not the number of values. It is the number of 
names.

{code}
- lst name=facet_fields
- lst name=hgid
   int name=HGPY045FD36D4000A1/int
   int name=HGPY0FBC6690453A91/int
   int name=HGPY1E44ED6C4FB3B1/int
   int name=HGPY1FA631034A1B81/int
   int name=HGPY3317ABAC43B481/int
   int name=HGPY3A17B2294CB5A5/int
   int name=HGPY3ADD2B3D48C391/int
   /lst
   /lst
{code}

Becomes:

{code}
lst name=facet_fields
  lst name=hgid
   int name=namedistinct7/int  !-- this is not 11 --
   lst name=hgid
int name=HGPY045FD36D4000A1/int
int name=HGPY0FBC6690453A91/int
int name=HGPY1E44ED6C4FB3B1/int
int name=HGPY1FA631034A1B81/int
int name=HGPY3317ABAC43B481/int
int name=HGPY3A17B2294CB5A5/int
int name=HGPY3ADD2B3D48C391/int
   /lst
  /lst
/lst
{code}

  
 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-distinctFacet.patch, SOLR-2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006805#comment-13006805
 ] 

Bill Bell edited comment on SOLR-2242 at 3/15/11 6:10 AM:
--

OK this is complete.

Sample query:

http://localhost:8983/solr/select?q=*:*facet=truefacet.field=catrows=0facet.numfacetterms=2facet.limit=4

Sample output:
{code}
  ?xml version=1.0 encoding=UTF-8 ? 
- response
- lst name=responseHeader
  int name=status0/int 
  int name=QTime0/int 
- lst name=params
  str name=facet.numfacetterms2/str 
  str name=facettrue/str 
  str name=q*:*/str 
  str name=facet.limit4/str 
  str name=facet.fieldcat/str 
  str name=rows0/str 
  /lst
  /lst
  result name=response numFound=17 start=0 / 
- lst name=facet_counts
  lst name=facet_queries / 
- lst name=facet_fields
- lst name=cat
  int name=numFacetTerms14/int 
- lst name=counts
  int name=electronics14/int 
  int name=memory3/int 
  int name=connector2/int 
  int name=graphics card2/int 
  /lst
  /lst
  /lst
  lst name=facet_dates / 
  lst name=facet_ranges / 
  /lst
  /response
{code}

In Json:

{code}
facet_fields:{cat:[numFacetTerms,14,counts,[electronics,14,memory,3,connector,2,graphics
 card,2]]},facet_dates:{},facet_ranges:{}}}
{code}

  was (Author: billnbell):
OK this is complete.

Sample query:

http://localhost:8983/solr/select?q=*:*facet=truefacet.field=catrows=0facet.numfacetterms=2facet.limit=4

Sample output:
{code}
  ?xml version=1.0 encoding=UTF-8 ? 
- response
- lst name=responseHeader
  int name=status0/int 
  int name=QTime0/int 
- lst name=params
  str name=facet.numfacetterms2/str 
  str name=facettrue/str 
  str name=q*:*/str 
  str name=facet.limit4/str 
  str name=facet.fieldcat/str 
  str name=rows0/str 
  /lst
  /lst
  result name=response numFound=17 start=0 / 
- lst name=facet_counts
  lst name=facet_queries / 
- lst name=facet_fields
- lst name=cat
  int name=numFacetTerms14/int 
- lst name=counts
  int name=electronics14/int 
  int name=memory3/int 
  int name=connector2/int 
  int name=graphics card2/int 
  /lst
  /lst
  /lst
  lst name=facet_dates / 
  lst name=facet_ranges / 
  /lst
  /response
{code}

In Json:

{code}
{responseHeader:{status:0,QTime:0,params:{facet.numfacetterms:2,facet:true,q:*:*,facet.limit:4,facet.field:cat,wt:json,rows:0}},response:{numFound:17,start:0,docs:[]},facet_counts:{facet_queries:{},facet_fields:{cat:[numFacetTerms,14,counts,[electronics,14,memory,3,connector,2,graphics
 card,2]]},facet_dates:{},facet_ranges:{}}}

{code}
  
 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-distinctFacet.patch, SOLR-2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006807#comment-13006807
 ] 

Otis Gospodnetic commented on SOLR-2242:


Would this be more consistent?  facet.numfacetterms = facet.numFacetTerms

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-distinctFacet.patch, SOLR-2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006805#comment-13006805
 ] 

Bill Bell edited comment on SOLR-2242 at 3/15/11 6:16 AM:
--

OK this is complete.

Sample query:

http://localhost:8983/solr/select?q=*:*facet=truefacet.field=catrows=0facet.numfacetterms=2facet.limit=4

Sample output:
{code}
?xml version=1.0 encoding=UTF-8 ? 
response
  lst name=responseHeader
int name=status0/int 
int name=QTime0/int 
lst name=params
  str name=facet.numfacetterms2/str 
  str name=facettrue/str 
  str name=q*:*/str 
  str name=facet.limit4/str 
  str name=facet.fieldcat/str 
  str name=rows0/str 
/lst
  /lst
  result name=response numFound=17 start=0 / 
  lst name=facet_counts
lst name=facet_queries / 
lst name=facet_fields
  lst name=cat
int name=numFacetTerms14/int 
lst name=counts
  int name=electronics14/int 
  int name=memory3/int 
  int name=connector2/int 
  int name=graphics card2/int 
/lst
  /lst
/lst
lst name=facet_dates / 
lst name=facet_ranges / 
  /lst
  /response
{code}

In Json:

{code}
facet_fields:{cat:[numFacetTerms,14,counts,[electronics,14,memory,3,connector,2,graphics
 card,2]]},facet_dates:{},facet_ranges:{}}}
{code}

  was (Author: billnbell):
OK this is complete.

Sample query:

http://localhost:8983/solr/select?q=*:*facet=truefacet.field=catrows=0facet.numfacetterms=2facet.limit=4

Sample output:
{code}
  ?xml version=1.0 encoding=UTF-8 ? 
- response
- lst name=responseHeader
  int name=status0/int 
  int name=QTime0/int 
- lst name=params
  str name=facet.numfacetterms2/str 
  str name=facettrue/str 
  str name=q*:*/str 
  str name=facet.limit4/str 
  str name=facet.fieldcat/str 
  str name=rows0/str 
  /lst
  /lst
  result name=response numFound=17 start=0 / 
- lst name=facet_counts
  lst name=facet_queries / 
- lst name=facet_fields
- lst name=cat
  int name=numFacetTerms14/int 
- lst name=counts
  int name=electronics14/int 
  int name=memory3/int 
  int name=connector2/int 
  int name=graphics card2/int 
  /lst
  /lst
  /lst
  lst name=facet_dates / 
  lst name=facet_ranges / 
  /lst
  /response
{code}

In Json:

{code}
facet_fields:{cat:[numFacetTerms,14,counts,[electronics,14,memory,3,connector,2,graphics
 card,2]]},facet_dates:{},facet_ranges:{}}}
{code}
  
 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-distinctFacet.patch, SOLR-2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006808#comment-13006808
 ] 

Bill Bell commented on SOLR-2242:
-

Maybe, but I thought all params were supposed to be lower case?

I can easily change that ??

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-distinctFacet.patch, SOLR-2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006809#comment-13006809
 ] 

Bill Bell commented on SOLR-2242:
-

I am changing it. Since there is one example of upper/lower.

facet.enum.cache.minDf



 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-distinctFacet.patch, SOLR-2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006805#comment-13006805
 ] 

Bill Bell edited comment on SOLR-2242 at 3/15/11 6:20 AM:
--

OK this is complete.

Sample query:

http://localhost:8983/solr/select?q=*:*facet=truefacet.field=catrows=0facet.numFacetTerms=2facet.limit=4

Sample output:
{code}
?xml version=1.0 encoding=UTF-8 ? 
response
  lst name=responseHeader
int name=status0/int 
int name=QTime0/int 
lst name=params
  str name=facet.numfacetterms2/str 
  str name=facettrue/str 
  str name=q*:*/str 
  str name=facet.limit4/str 
  str name=facet.fieldcat/str 
  str name=rows0/str 
/lst
  /lst
  result name=response numFound=17 start=0 / 
  lst name=facet_counts
lst name=facet_queries / 
lst name=facet_fields
  lst name=cat
int name=numFacetTerms14/int 
lst name=counts
  int name=electronics14/int 
  int name=memory3/int 
  int name=connector2/int 
  int name=graphics card2/int 
/lst
  /lst
/lst
lst name=facet_dates / 
lst name=facet_ranges / 
  /lst
  /response
{code}

In Json:

{code}
facet_fields:{cat:[numFacetTerms,14,counts,[electronics,14,memory,3,connector,2,graphics
 card,2]]},facet_dates:{},facet_ranges:{}}}
{code}

  was (Author: billnbell):
OK this is complete.

Sample query:

http://localhost:8983/solr/select?q=*:*facet=truefacet.field=catrows=0facet.numfacetterms=2facet.limit=4

Sample output:
{code}
?xml version=1.0 encoding=UTF-8 ? 
response
  lst name=responseHeader
int name=status0/int 
int name=QTime0/int 
lst name=params
  str name=facet.numfacetterms2/str 
  str name=facettrue/str 
  str name=q*:*/str 
  str name=facet.limit4/str 
  str name=facet.fieldcat/str 
  str name=rows0/str 
/lst
  /lst
  result name=response numFound=17 start=0 / 
  lst name=facet_counts
lst name=facet_queries / 
lst name=facet_fields
  lst name=cat
int name=numFacetTerms14/int 
lst name=counts
  int name=electronics14/int 
  int name=memory3/int 
  int name=connector2/int 
  int name=graphics card2/int 
/lst
  /lst
/lst
lst name=facet_dates / 
lst name=facet_ranges / 
  /lst
  /response
{code}

In Json:

{code}
facet_fields:{cat:[numFacetTerms,14,counts,[electronics,14,memory,3,connector,2,graphics
 card,2]]},facet_dates:{},facet_ranges:{}}}
{code}
  
 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-distinctFacet.patch, SOLR-2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Problem of Replication Reservation Duration

2011-03-15 Thread Li Li
The original logic is correct. I read the codes and found my
understanding incorrectly.
The ReplicationHandler  will reserve current fetched version every 5 packets

if (indexVersion != null  (packetsWritten % 5 == 0)) {
  //after every 5 packets reserve the commitpoint for some time
  delPolicy.setReserveDuration(indexVersion, reserveCommitDuration);
}

So my supposed extreme will never happen.

2011/3/11 Li Li fancye...@gmail.com:
 -- Forwarded message --
 From: Li Li fancye...@gmail.com
 Date: 2011/3/11
 Subject: Problem of Replication Reservation Duration
 To: solr-...@lucene.apache.org


 hi all,
     The replication handler in solr 1.4 which we used seems to be a
 little problematic in some extreme situation.
     The default reserve duration is 10s and can't modified by any method.
       private Integer reserveCommitDuration =
 SnapPuller.readInterval(00:00:10);
     The current implementation is: slave send a http
 request(CMD_GET_FILE_LIST) to ask server list current index files.
     In the response codes of master, it will reserve this commit for 10s.
       // reserve the indexcommit for sometime
       core.getDeletionPolicy().setReserveDuration(version,
 reserveCommitDuration);
    If the master's indexes are changed within 10s, the old version
 will not be deleted. Otherwise, the old version will be deleted.
     slave then get the files in the list one by one.
     considering the following situation.
     Every mid-night we optimize the whole indexes into one single
 index, and every 15 minutes, we add new segments to it.
     e.g. when the slave copy the large optimized indexes, it will cost
 more than 15 minutes. So it will fail to copy all files and
 retry 5 minutes later. But each time it will re-copy all the files
 into a new tmp directory. it will fail again and again as long as
 we update indexes within 15 minutes.
     we can tack this problem by setting reserveCommitDuration to 20
 minutes. But then because we update small number of
 documents very frequently, many useless indexes will be reserved and
 it's a waste of disk space.
     Any one confronted the problem before and is there any solution for it?
     We comes up a ugly solution like this: slave fetches files using
 multithreads. each file a thread. Thus master will open all the
 files that slave needs. As long as the file is opened. when master
 want to delete them, these files will be deleted. But the inode
 reference count is larger than 0.  Because reading too many files by
 master will decrease the ability of master. we want to use
 some synchronization mechanism to allow only 1 or 2 ReplicationHandler
 threads are doing CMD_GET_FILE command.
     Is that solution feasible?


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2749) Co-occurrence filter

2011-03-15 Thread Elmar Pitschke (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006812#comment-13006812
 ] 

Elmar Pitschke commented on LUCENE-2749:


Hi Steven,
thanks for the info, i will work through it and get back here with some 
questions.
As i have a lot to do with Lucene at my work, this filter would definitely 
something that i could use. So the work would not be lost ;)
Regards
   Elmar

 Co-occurrence filter
 

 Key: LUCENE-2749
 URL: https://issues.apache.org/jira/browse/LUCENE-2749
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Analysis
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Priority: Minor
 Fix For: 4.0


 The co-occurrence filter to be developed here will output sets of tokens that 
 co-occur within a given window onto a token stream.  
 These token sets can be ordered either lexically (to allow order-independent 
 matching/counting) or positionally (e.g. sliding windows of positionally 
 ordered co-occurring terms that include all terms in the window are called 
 n-grams or shingles). 
 The parameters to this filter will be: 
 * window size: this can be a fixed sequence length, sentence/paragraph 
 context (these will require sentence/paragraph segmentation, which is not in 
 Lucene yet), or over the entire token stream (full field width)
 * minimum number of co-occurring terms: = 2
 * maximum number of co-occurring terms: = window size
 * token set ordering (lexical or positional)
 One use case for co-occurring token sets is as candidates for collocations.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



ClassCastException SOLR 1709 Distributed Date Faceting

2011-03-15 Thread Viswa S

Folks,

I applied the 4.x patch onto trunk and complied. However there seems to be run 
time exception as below

Thanks
Viswa

type Status report

message java.util.Date cannot be cast to java.lang.Integer 
java.lang.ClassCastException: java.util.Date cannot be cast to 
java.lang.Integer at 
org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponent.java:294)
 at 
org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:232)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:326)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1325) at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
 at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) 
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) 
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) 
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at 
java.lang.Thread.run(Unknown Source)

description The server encountered an internal error (java.util.Date cannot be 
cast to java.lang.Integer java.lang.ClassCastException: java.util.Date cannot 
be cast to java.lang.Integer at 
org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponent.java:294)
 at 
org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:232)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:326)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1325) at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
 at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) 
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) 
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) 
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at 
java.lang.Thread.run(Unknown Source) ) that prevented it from fulfilling this 
request.
  

[jira] Created: (SOLR-2426) Build failing

2011-03-15 Thread Bill Bell (JIRA)
Build failing
-

 Key: SOLR-2426
 URL: https://issues.apache.org/jira/browse/SOLR-2426
 Project: Solr
  Issue Type: Bug
Reporter: Bill Bell


ant clean
ant example
trunk
[javac]  ^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSetHitCo
llector.java:77: incompatible types
[javac] found   : org.apache.solr.search.BitDocSet
[javac] required: org.apache.solr.search.DocSet
[javac]   return new BitDocSet(bits,pos);
[javac]  ^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSetHitCo
llector.java:132: incompatible types
[javac] found   : org.apache.solr.search.SortedIntDocSet
[javac] required: org.apache.solr.search.DocSet
[javac]   return new SortedIntDocSet(scratch, pos);
[javac]  ^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSetHitCo
llector.java:136: incompatible types
[javac] found   : org.apache.solr.search.BitDocSet
[javac] required: org.apache.solr.search.DocSet
[javac]   return new BitDocSet(bits,pos);
[javac]  ^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
va:26: org.apache.solr.search.DocSlice is not abstract and does not override abs
tract method getTopFilter() in org.apache.solr.search.DocSet
[javac] public class DocSlice extends DocSetBase implements DocList {
[javac]^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
va:54: incompatible types
[javac] found   : org.apache.solr.search.DocSlice
[javac] required: org.apache.solr.search.DocList
[javac] if (this.offset == offset  this.len==len) return this;
[javac]^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
va:62: incompatible types
[javac] found   : org.apache.solr.search.DocSlice
[javac] required: org.apache.solr.search.DocList
[javac] if (this.offset == offset  this.len == realLen) return this;
[javac]  ^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
va:63: incompatible types
[javac] found   : org.apache.solr.search.DocSlice
[javac] required: org.apache.solr.search.DocList
[javac] return new DocSlice(offset, realLen, docs, scores, matches, maxS
core);
[javac]^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
va:130: intersection(org.apache.solr.search.DocSet) in org.apache.solr.search.Do
cSet cannot be applied to (org.apache.solr.search.DocSlice)
[javac]   return other.intersection(this);
[javac]   ^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
va:139: intersectionSize(org.apache.solr.search.DocSet) in org.apache.solr.searc
h.DocSet cannot be applied to (org.apache.solr.search.DocSlice)
[javac]   return other.intersectionSize(this);
[javac]   ^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\ExtendedDis
maxQParserPlugin.java:829: warning: [unchecked] unchecked conversion
[javac] found   : java.util.List
[javac] required: java.util.Listorg.apache.lucene.search.BooleanClause
[javac]   Query q = super.getBooleanQuery(clauses, disableCoord);
[javac]   ^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\ExtendedDis
maxQParserPlugin.java:845: warning: [unchecked] unchecked conversion
[javac] found   : java.util.List
[javac] required: java.util.Listorg.apache.lucene.search.BooleanClause
[javac]   super.addClause(clauses, conj, mods, q);
[javac]   ^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\FastLRUCach
e.java:107: warning: [unchecked] unchecked cast
[javac] found   : java.lang.Object
[javac] required: java.util.Listorg.apache.solr.common.util.ConcurrentLRUCa
che.Stats
[javac] statsList = (ListConcurrentLRUCache.Stats) persistence;
[javac]  ^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\FastLRUCach
e.java:263: warning: [unchecked] unchecked cast
[javac] found   : java.util.Set
[javac] required: java.util.Setjava.util.Map.Entry
[javac]   for (Map.Entry e : (Set Map.Entry)items.entrySet()) {
[javac] ^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\Grouping.ja
va:61: warning: [unchecked] unchecked call to add(java.lang.String,T) as a membe
r of the raw type org.apache.solr.common.util.NamedList
[javac]   grouped.add(key, groupResult);  // grouped={ key={
[javac]  ^

[jira] Updated: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2242:


Attachment: (was: SOLR-2242.v2.patch)

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-distinctFacet.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2242:


Attachment: SOLR.2242.v2.patch

New ver

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-distinctFacet.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006792#comment-13006792
 ] 

Bill Bell edited comment on SOLR-2242 at 3/15/11 8:22 AM:
--

I am going to use your suggestion. You will not have to set the limit. Getting 
the numFacetTerms will be optional, and you also will be able to NOT get the 
hgids as well. I propose this (please comment):

This will ONLY output the numFacetTerms (no hgid facet counts):
http://localhost:8983/solr/select?q=*:*facet=truefacet.field=hgidf.hgid.facet.numFacetTerms=1

This assumes the count will be limit=-1

{code}
lst name=facet_fields
  lst name=hgid
   int name=numFacetTerms7/int  !-- this is not 11 --
  /lst
/lst
{code}

This will output the numFacetTerms AND hgid:
http://localhost:8983/solr/select?q=*:*facet=truefacet.field=hgidfacet.mincount=1f.hgid.facet.numFacetTerms=2

{code}
lst name=facet_fields
  lst name=hgid
   int name=numFacetTerms7/int  !-- this is not 11 --
   lst name=counts
int name=HGPY045FD36D4000A1/int
int name=HGPY0FBC6690453A91/int
int name=HGPY1E44ED6C4FB3B1/int
int name=HGPY1FA631034A1B81/int
int name=HGPY3317ABAC43B481/int
int name=HGPY3A17B2294CB5A5/int
int name=HGPY3ADD2B3D48C391/int
   /lst
  /lst
/lst
{code}

  was (Author: billnbell):
I am going to use your suggestion. You will not have to set the limit. 
Getting the numFacetTerms will be optional, and you also will be able to NOT 
get the hgids as well. I propose this (please comment):

This will ONLY output the numFacetTerms (no hgid facet counts):
http://localhost:8983/solr/select?q=*:*facet=truefacet.field=hgidf.hgid.facet.numfacetterms=1

This assumes the count will be limit=-1

{code}
lst name=facet_fields
  lst name=hgid
   int name=numFacetTerms7/int  !-- this is not 11 --
  /lst
/lst
{code}

This will output the numFacetTerms AND hgid:
http://localhost:8983/solr/select?q=*:*facet=truefacet.field=hgidfacet.mincount=1f.hgid.facet.numfacetterms=2

{code}
lst name=facet_fields
  lst name=hgid
   int name=numFacetTerms7/int  !-- this is not 11 --
   lst name=counts
int name=HGPY045FD36D4000A1/int
int name=HGPY0FBC6690453A91/int
int name=HGPY1E44ED6C4FB3B1/int
int name=HGPY1FA631034A1B81/int
int name=HGPY3317ABAC43B481/int
int name=HGPY3A17B2294CB5A5/int
int name=HGPY3ADD2B3D48C391/int
   /lst
  /lst
/lst
{code}
  
 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2242:


Attachment: (was: SOLR-2242-distinctFacet.patch)

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2242:


Comment: was deleted

(was: Maybe, but I thought all params were supposed to be lower case?

I can easily change that ??)

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2242:


Comment: was deleted

(was: New ver)

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2242:


Comment: was deleted

(was: v2 of the release based on feedback.

Note: SOLR-2242-distinctFacet.patch not needed (left for history))

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: I want to take part in Google Summer Code 2011

2011-03-15 Thread Anurag
I did one of the project where i crawled the data through Nutch-1.0 and did
indexing to Apache solr to establish a search engine with proper UI like
autosuggest,spellcheck running on tomcat server .

Now we are extending the project to included novel fuzzy queries usign OWA
operator like at least half, as many as possible etc...this is different
from usual boolean search. We are refering to a paper presented by our
respected Prof. M.M. Sufyan Beg. This will be implemented in Apache-solr .

-
Kumar Anurag

--
View this message in context: 
http://lucene.472066.n3.nabble.com/I-want-to-take-part-in-Google-Summer-Code-2011-tp2668316p2680987.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2011-03-15 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006871#comment-13006871
 ] 

Simon Willnauer commented on LUCENE-2573:
-

bq. I still see a healtiness (mis-spelled) in DW.
ugh. I will fix
{quote}
I'd rather not have the stalling/healthiness be baked into the API, at
all. Can we put the hijack logic entirely private in the flush-by-ram
policies? (Ie remove isStalled()/hijackThreadsForFlush()).
{quote}

I agree for the hijack part but the isStalled is something I might want to 
control. I mean we can still open it up eventually so rather make it private 
for now but keep a not on in. 

{quote}
Can we move FlushSpecification out of FlushPolicy? Ie, it's a private
impl detail of DW right? (Not part of FlushPolicy's API). Actually
why do we need it? Can't we just return the DWPT?
{quote}

it currently holds the ram usage for that DWPT when it was checked out so that 
I can reduce the flushBytes accordingly. We can maybe get rid of it entirely 
but I don't want to rely on the DWPT bytesUsed() though.
We can certainly move it out - this inner class is a relict though.

bq. Why do we have a separate DocWriterSession? Can't this be absorbed
into DocWriter?

I generally don't like cluttering DocWriter and let it grow like IW. 
DocWriterSession might not be the ideal name for this class but its really a 
ram tracker for this DW. Yet, we can move out some parts that do not directly 
relate to mem tracking. Maybe DocWriterBytes?

bq. Be careful defaulting TermsHash.trackAllocations to true – eg term
vectors wants this to be false.

I need to go through the IndexingChain and check carefully where to track 
memory anyway. I haven't got to that yet but good that you mention it that one 
could easily get lost.





bq. Instead of FlushPolicy.message, can't the policy call DW.message?
I don't want to couple that API to DW. What would be the benefit beside from 
saving a single method?
{quote}
On the by-RAM flush policies... when you hit the high water mark, we
should 1) 
flush all DWPTs and 2) stall any other threads.
{quote}
Well I am not sure if we should do that. I don't really see why we should 
forcefully stop the world here. Incoming threads will pick up a flush 
immediately and if we have enough resources to index further why should we wait 
until all DWPT are flushed. if we stall I fear that we could queue up threads 
that could help flushing while stalling would simply stop them doing anything, 
right? You can still control this with the healthiness though. We currently do 
flush all DWPT btw. once we hit the HW. 

{quote}
Why do we dereference the DWPTs with their ord? EG, can't we just
store their 
state (active or flushPending) on the DWPT instead of in
a separate states[]?
{quote}
That is definitely an option. I will give that a go.
{quote}
Do we really need FlushState.Aborted? And if not... do we really need

FlushState (since it just becomes 2 states, ie, Active or Flushing,
which I 
think is then redundant w/ flushPending boolean?).
{quote}
this needs some more refactoring I will attach another iteration
{quote}
I think the default low water should be 1X of your RAM buffer? And
high water 
maybe 2X? (For both flush-by-RAM policies).
{quote}
hmm, I think we need to revise the maxRAMBufferMB Javadoc anyway so we have all 
the freedom to do whatever we want. yet, I think we should try to keep the RAM 
consumption similar to what it would have used in a previous release. So if we 
say HW is 2x then suddenly some apps might run out of memory. I am not sure if 
we should do that or rather stick to the 90% to 110% for now.  We need to find 
good defaults for this anyway.


 Tiered flushing of DWPTs by RAM with low/high water marks
 -

 Key: LUCENE-2573
 URL: https://issues.apache.org/jira/browse/LUCENE-2573
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Simon Willnauer
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
 LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch


 Now that we have DocumentsWriterPerThreads we need to track total consumed 
 RAM across all DWPTs.
 A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
 tiered approach:  
 - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
 - Flush all DWPTs at a high water mark (e.g. at 110%)
 - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
 used, flush at 90%, 95%, 100%, 105% and 110%.
 Should we allow the user to configure the low and high water mark values 
 explicitly using total values (e.g. low water mark at 120MB, high water mark 
 at 140MB)?  Or shall we 

Solr query POST and not in GET

2011-03-15 Thread Gastone Penzo
Hi,
is possible to change Solr sending query method from get to post?
because my query has a lot of OR..OR..OR and the log says to me Request URI
too large
Where can i change it??
thanx




-- 
Gastone Penzo

www.solr-italia.it
The first italian blog about SOLR


[jira] Commented: (LUCENE-2957) generate-maven-artifacts target should include all non-Mavenized Lucene Solr dependencies

2011-03-15 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006878#comment-13006878
 ] 

Dawid Weiss commented on LUCENE-2957:
-

Hi Steven. This issue is closed, but just to mark it for the future: I've added 
a retrowoven version of Carrot2-core, it will be part of maintenance release 
3.4.4:
https://oss.sonatype.org/content/repositories/snapshots/org/carrot2/carrot2-core/3.4.4-SNAPSHOT/

The -jdk15 classifier is the one working with Java 1.5 (I checked with our 
examples and they work fine, so there should be no problems with it in SOLR).

 generate-maven-artifacts target should include all non-Mavenized Lucene  
 Solr dependencies
 ---

 Key: LUCENE-2957
 URL: https://issues.apache.org/jira/browse/LUCENE-2957
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1, 3.2, 4.0
Reporter: Steven Rowe
Assignee: Steven Rowe
Priority: Minor
 Fix For: 3.1, 3.2, 4.0

 Attachments: LUCENE-2923-part3.patch, LUCENE-2957-part2.patch, 
 LUCENE-2957.patch


 Currently, in addition to deploying artifacts for all of the Lucene and Solr 
 modules to a repository (by default local), the {{generate-maven-artifacts}} 
 target also deploys artifacts for the following non-Mavenized Solr 
 dependencies (lucene_solr_3_1 version given here):
 # {{solr/lib/commons-csv-1.0-SNAPSHOT-r966014.jar}} as 
 org.apache.solr:solr-commons-csv:3.1
 # {{solr/lib/apache-solr-noggit-r944541.jar}} as 
 org.apache.solr:solr-noggit:3.1
 \\ \\
 The following {{.jar}}'s should be added to the above list (lucene_solr_3_1 
 version given here):
 \\ \\
 # {{lucene/contrib/icu/lib/icu4j-4_6.jar}}
 # 
 {{lucene/contrib/benchmark/lib/xercesImpl-2.9.1-patched-XERCESJ}}{{-1257.jar}}
 # {{solr/contrib/clustering/lib/carrot2-core-3.4.2.jar}}**
 # {{solr/contrib/uima/lib/uima-an-alchemy.jar}}
 # {{solr/contrib/uima/lib/uima-an-calais.jar}}
 # {{solr/contrib/uima/lib/uima-an-tagger.jar}}
 # {{solr/contrib/uima/lib/uima-an-wst.jar}}
 # {{solr/contrib/uima/lib/uima-core.jar}}
 \\ \\
 I think it makes sense to follow the same model as the current non-Mavenized 
 dependencies:
 \\ \\
 * {{groupId}} = {{org.apache.solr/.lucene}}
 * {{artifactId}} = {{solr-/lucene-}}original-name,
 * {{version}} = lucene-solr-release-version.
 **The carrot2-core jar doesn't need to be included in trunk's release 
 artifacts, since there already is a Mavenized Java6-compiled jar.  branch_3x 
 and lucene_solr_3_1 will need this Solr-specific Java5-compiled maven 
 artifact, though.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-2426) Build failing

2011-03-15 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-2426.
---

Resolution: Not A Problem

Trunk requires java 6.

 Build failing
 -

 Key: SOLR-2426
 URL: https://issues.apache.org/jira/browse/SOLR-2426
 Project: Solr
  Issue Type: Bug
Reporter: Bill Bell

 ant clean
 ant example
 trunk
 [javac]  ^
 [javac] 
 C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSetHitCo
 llector.java:77: incompatible types
 [javac] found   : org.apache.solr.search.BitDocSet
 [javac] required: org.apache.solr.search.DocSet
 [javac]   return new BitDocSet(bits,pos);
 [javac]  ^
 [javac] 
 C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSetHitCo
 llector.java:132: incompatible types
 [javac] found   : org.apache.solr.search.SortedIntDocSet
 [javac] required: org.apache.solr.search.DocSet
 [javac]   return new SortedIntDocSet(scratch, pos);
 [javac]  ^
 [javac] 
 C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSetHitCo
 llector.java:136: incompatible types
 [javac] found   : org.apache.solr.search.BitDocSet
 [javac] required: org.apache.solr.search.DocSet
 [javac]   return new BitDocSet(bits,pos);
 [javac]  ^
 [javac] 
 C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
 va:26: org.apache.solr.search.DocSlice is not abstract and does not override 
 abs
 tract method getTopFilter() in org.apache.solr.search.DocSet
 [javac] public class DocSlice extends DocSetBase implements DocList {
 [javac]^
 [javac] 
 C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
 va:54: incompatible types
 [javac] found   : org.apache.solr.search.DocSlice
 [javac] required: org.apache.solr.search.DocList
 [javac] if (this.offset == offset  this.len==len) return this;
 [javac]^
 [javac] 
 C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
 va:62: incompatible types
 [javac] found   : org.apache.solr.search.DocSlice
 [javac] required: org.apache.solr.search.DocList
 [javac] if (this.offset == offset  this.len == realLen) return this;
 [javac]  ^
 [javac] 
 C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
 va:63: incompatible types
 [javac] found   : org.apache.solr.search.DocSlice
 [javac] required: org.apache.solr.search.DocList
 [javac] return new DocSlice(offset, realLen, docs, scores, matches, 
 maxS
 core);
 [javac]^
 [javac] 
 C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
 va:130: intersection(org.apache.solr.search.DocSet) in 
 org.apache.solr.search.Do
 cSet cannot be applied to (org.apache.solr.search.DocSlice)
 [javac]   return other.intersection(this);
 [javac]   ^
 [javac] 
 C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
 va:139: intersectionSize(org.apache.solr.search.DocSet) in 
 org.apache.solr.searc
 h.DocSet cannot be applied to (org.apache.solr.search.DocSlice)
 [javac]   return other.intersectionSize(this);
 [javac]   ^
 [javac] 
 C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\ExtendedDis
 maxQParserPlugin.java:829: warning: [unchecked] unchecked conversion
 [javac] found   : java.util.List
 [javac] required: java.util.Listorg.apache.lucene.search.BooleanClause
 [javac]   Query q = super.getBooleanQuery(clauses, disableCoord);
 [javac]   ^
 [javac] 
 C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\ExtendedDis
 maxQParserPlugin.java:845: warning: [unchecked] unchecked conversion
 [javac] found   : java.util.List
 [javac] required: java.util.Listorg.apache.lucene.search.BooleanClause
 [javac]   super.addClause(clauses, conj, mods, q);
 [javac]   ^
 [javac] 
 C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\FastLRUCach
 e.java:107: warning: [unchecked] unchecked cast
 [javac] found   : java.lang.Object
 [javac] required: 
 java.util.Listorg.apache.solr.common.util.ConcurrentLRUCa
 che.Stats
 [javac] statsList = (ListConcurrentLRUCache.Stats) persistence;
 [javac]  ^
 [javac] 
 C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\FastLRUCach
 e.java:263: warning: [unchecked] unchecked cast
 [javac] found   : java.util.Set
 [javac] required: java.util.Setjava.util.Map.Entry
 [javac]   for (Map.Entry e : (Set Map.Entry)items.entrySet()) {
 [javac]

Re: svn commit: r1081745 - /lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/fst/TestFSTs.java

2011-03-15 Thread Michael McCandless
Looks good Dawid!

On Tue, Mar 15, 2011 at 8:20 AM,  dwe...@apache.org wrote:
 Author: dweiss
 Date: Tue Mar 15 12:20:03 2011
 New Revision: 1081745

 URL: http://svn.apache.org/viewvc?rev=1081745view=rev
 Log:
 Adding -noverify and a little bit nicer output to TestFSTs. These are 
 debugging/analysis utils that are not used anywhere, so I commit them without 
 the patch.

 Modified:
    
 lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/fst/TestFSTs.java

 Modified: 
 lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/fst/TestFSTs.java
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/fst/TestFSTs.java?rev=1081745r1=1081744r2=1081745view=diff
 ==
 --- 
 lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/fst/TestFSTs.java
  (original)
 +++ 
 lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/fst/TestFSTs.java
  Tue Mar 15 12:20:03 2011
 @@ -25,16 +25,7 @@ import java.io.IOException;
  import java.io.InputStreamReader;
  import java.io.OutputStreamWriter;
  import java.io.Writer;
 -import java.util.ArrayList;
 -import java.util.Arrays;
 -import java.util.Collections;
 -import java.util.HashMap;
 -import java.util.HashSet;
 -import java.util.Iterator;
 -import java.util.List;
 -import java.util.Map;
 -import java.util.Random;
 -import java.util.Set;
 +import java.util.*;

  import org.apache.lucene.analysis.MockAnalyzer;
  import org.apache.lucene.document.Document;
 @@ -1098,7 +1089,7 @@ public class TestFSTs extends LuceneTest

     protected abstract T getOutput(IntsRef input, int ord) throws IOException;

 -    public void run(int limit) throws IOException {
 +    public void run(int limit, boolean verify) throws IOException {
       BufferedReader is = new BufferedReader(new InputStreamReader(new 
 FileInputStream(wordsFileIn), UTF-8), 65536);
       try {
         final IntsRef intsRef = new IntsRef(10);
 @@ -1115,7 +1106,9 @@ public class TestFSTs extends LuceneTest

           ord++;
           if (ord % 50 == 0) {
 -            System.out.println(((System.currentTimeMillis()-tStart)/1000.0) 
 + s:  + ord + ...);
 +            System.out.println(
 +                String.format(Locale.ENGLISH,
 +                    %6.2fs: %9d..., ((System.currentTimeMillis() - tStart) 
 / 1000.0), ord));
           }
           if (ord = limit) {
             break;
 @@ -1144,6 +1137,10 @@ public class TestFSTs extends LuceneTest

         System.out.println(Saved FST to fst.bin.);

 +        if (!verify) {
 +          System.exit(0);
 +        }
 +
         System.out.println(\nNow verify...);

         is.close();
 @@ -1194,6 +1191,7 @@ public class TestFSTs extends LuceneTest
     int inputMode = 0;                             // utf8
     boolean storeOrds = false;
     boolean storeDocFreqs = false;
 +    boolean verify = true;
     while(idx  args.length) {
       if (args[idx].equals(-prune)) {
         prune = Integer.valueOf(args[1+idx]);
 @@ -1215,6 +1213,9 @@ public class TestFSTs extends LuceneTest
       if (args[idx].equals(-ords)) {
         storeOrds = true;
       }
 +      if (args[idx].equals(-noverify)) {
 +        verify = false;
 +      }
       idx++;
     }

 @@ -1235,7 +1236,7 @@ public class TestFSTs extends LuceneTest
           return new PairOutputs.PairLong,Long(o1.get(ord),
                                                  
 o2.get(_TestUtil.nextInt(rand, 1, 5000)));
         }
 -      }.run(limit);
 +      }.run(limit, verify);
     } else if (storeOrds) {
       // Store only ords
       final PositiveIntOutputs outputs = 
 PositiveIntOutputs.getSingleton(true);
 @@ -1244,7 +1245,7 @@ public class TestFSTs extends LuceneTest
         public Long getOutput(IntsRef input, int ord) {
           return outputs.get(ord);
         }
 -      }.run(limit);
 +      }.run(limit, verify);
     } else if (storeDocFreqs) {
       // Store only docFreq
       final PositiveIntOutputs outputs = 
 PositiveIntOutputs.getSingleton(false);
 @@ -1257,7 +1258,7 @@ public class TestFSTs extends LuceneTest
           }
           return outputs.get(_TestUtil.nextInt(rand, 1, 5000));
         }
 -      }.run(limit);
 +      }.run(limit, verify);
     } else {
       // Store nothing
       final NoOutputs outputs = NoOutputs.getSingleton();
 @@ -1267,7 +1268,7 @@ public class TestFSTs extends LuceneTest
         public Object getOutput(IntsRef input, int ord) {
           return NO_OUTPUT;
         }
 -      }.run(limit);
 +      }.run(limit, verify);
     }
   }







-- 
Mike

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1081745 - /lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/fst/TestFSTs.java

2011-03-15 Thread Dawid Weiss
Thanks Mike :)
Dawid

On Tue, Mar 15, 2011 at 1:22 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 Looks good Dawid!

 On Tue, Mar 15, 2011 at 8:20 AM,  dwe...@apache.org wrote:
 Author: dweiss
 Date: Tue Mar 15 12:20:03 2011
 New Revision: 1081745

 URL: http://svn.apache.org/viewvc?rev=1081745view=rev
 Log:
 Adding -noverify and a little bit nicer output to TestFSTs. These are 
 debugging/analysis utils that are not used anywhere, so I commit them 
 without the patch.

 Modified:
    
 lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/fst/TestFSTs.java

 Modified: 
 lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/fst/TestFSTs.java
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/fst/TestFSTs.java?rev=1081745r1=1081744r2=1081745view=diff
 ==
 --- 
 lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/fst/TestFSTs.java
  (original)
 +++ 
 lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/fst/TestFSTs.java
  Tue Mar 15 12:20:03 2011
 @@ -25,16 +25,7 @@ import java.io.IOException;
  import java.io.InputStreamReader;
  import java.io.OutputStreamWriter;
  import java.io.Writer;
 -import java.util.ArrayList;
 -import java.util.Arrays;
 -import java.util.Collections;
 -import java.util.HashMap;
 -import java.util.HashSet;
 -import java.util.Iterator;
 -import java.util.List;
 -import java.util.Map;
 -import java.util.Random;
 -import java.util.Set;
 +import java.util.*;

  import org.apache.lucene.analysis.MockAnalyzer;
  import org.apache.lucene.document.Document;
 @@ -1098,7 +1089,7 @@ public class TestFSTs extends LuceneTest

     protected abstract T getOutput(IntsRef input, int ord) throws 
 IOException;

 -    public void run(int limit) throws IOException {
 +    public void run(int limit, boolean verify) throws IOException {
       BufferedReader is = new BufferedReader(new InputStreamReader(new 
 FileInputStream(wordsFileIn), UTF-8), 65536);
       try {
         final IntsRef intsRef = new IntsRef(10);
 @@ -1115,7 +1106,9 @@ public class TestFSTs extends LuceneTest

           ord++;
           if (ord % 50 == 0) {
 -            System.out.println(((System.currentTimeMillis()-tStart)/1000.0) 
 + s:  + ord + ...);
 +            System.out.println(
 +                String.format(Locale.ENGLISH,
 +                    %6.2fs: %9d..., ((System.currentTimeMillis() - 
 tStart) / 1000.0), ord));
           }
           if (ord = limit) {
             break;
 @@ -1144,6 +1137,10 @@ public class TestFSTs extends LuceneTest

         System.out.println(Saved FST to fst.bin.);

 +        if (!verify) {
 +          System.exit(0);
 +        }
 +
         System.out.println(\nNow verify...);

         is.close();
 @@ -1194,6 +1191,7 @@ public class TestFSTs extends LuceneTest
     int inputMode = 0;                             // utf8
     boolean storeOrds = false;
     boolean storeDocFreqs = false;
 +    boolean verify = true;
     while(idx  args.length) {
       if (args[idx].equals(-prune)) {
         prune = Integer.valueOf(args[1+idx]);
 @@ -1215,6 +1213,9 @@ public class TestFSTs extends LuceneTest
       if (args[idx].equals(-ords)) {
         storeOrds = true;
       }
 +      if (args[idx].equals(-noverify)) {
 +        verify = false;
 +      }
       idx++;
     }

 @@ -1235,7 +1236,7 @@ public class TestFSTs extends LuceneTest
           return new PairOutputs.PairLong,Long(o1.get(ord),
                                                  
 o2.get(_TestUtil.nextInt(rand, 1, 5000)));
         }
 -      }.run(limit);
 +      }.run(limit, verify);
     } else if (storeOrds) {
       // Store only ords
       final PositiveIntOutputs outputs = 
 PositiveIntOutputs.getSingleton(true);
 @@ -1244,7 +1245,7 @@ public class TestFSTs extends LuceneTest
         public Long getOutput(IntsRef input, int ord) {
           return outputs.get(ord);
         }
 -      }.run(limit);
 +      }.run(limit, verify);
     } else if (storeDocFreqs) {
       // Store only docFreq
       final PositiveIntOutputs outputs = 
 PositiveIntOutputs.getSingleton(false);
 @@ -1257,7 +1258,7 @@ public class TestFSTs extends LuceneTest
           }
           return outputs.get(_TestUtil.nextInt(rand, 1, 5000));
         }
 -      }.run(limit);
 +      }.run(limit, verify);
     } else {
       // Store nothing
       final NoOutputs outputs = NoOutputs.getSingleton();
 @@ -1267,7 +1268,7 @@ public class TestFSTs extends LuceneTest
         public Object getOutput(IntsRef input, int ord) {
           return NO_OUTPUT;
         }
 -      }.run(limit);
 +      }.run(limit, verify);
     }
   }







 --
 Mike

 http://blog.mikemccandless.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] Created: (LUCENE-2967) Use linear probing with an additional good bit avalanching function in FST's NodeHash.

2011-03-15 Thread Dawid Weiss (JIRA)
Use linear probing with an additional good bit avalanching function in FST's 
NodeHash.
--

 Key: LUCENE-2967
 URL: https://issues.apache.org/jira/browse/LUCENE-2967
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Trivial
 Fix For: 4.0


I recently had an interesting discussion with Sebastiano Vigna (fastutil), who 
suggested that linear probing, given a hash mixing function with good avalanche 
properties, is a way better method of constructing lookups in associative 
arrays compared to quadratic probing. Indeed, with linear probing you can 
implement removals from a hash map without removed slot markers and linear 
probing has nice properties with respect to modern CPUs (caches). I've 
reimplemented HPPC's hash maps to use linear probing and we observed a nice 
speedup (the same applies for fastutils of course).

This patch changes NodeHash's implementation to use linear probing. The code is 
a bit simpler (I think :). I also moved the load factor to a constant -- 0.5 
seems like a generous load factor, especially if we allow large FSTs to be 
built. I don't see any significant speedup in constructing large automata, but 
there is no slowdown either (I checked on one machine only for now, but will 
verify on other machines too).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2967) Use linear probing with an additional good bit avalanching function in FST's NodeHash.

2011-03-15 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-2967:


Attachment: LUCENE-2967.patch

Linear probing in NodeHash.

 Use linear probing with an additional good bit avalanching function in FST's 
 NodeHash.
 --

 Key: LUCENE-2967
 URL: https://issues.apache.org/jira/browse/LUCENE-2967
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Trivial
 Fix For: 4.0

 Attachments: LUCENE-2967.patch


 I recently had an interesting discussion with Sebastiano Vigna (fastutil), 
 who suggested that linear probing, given a hash mixing function with good 
 avalanche properties, is a way better method of constructing lookups in 
 associative arrays compared to quadratic probing. Indeed, with linear probing 
 you can implement removals from a hash map without removed slot markers and 
 linear probing has nice properties with respect to modern CPUs (caches). I've 
 reimplemented HPPC's hash maps to use linear probing and we observed a nice 
 speedup (the same applies for fastutils of course).
 This patch changes NodeHash's implementation to use linear probing. The code 
 is a bit simpler (I think :). I also moved the load factor to a constant -- 
 0.5 seems like a generous load factor, especially if we allow large FSTs to 
 be built. I don't see any significant speedup in constructing large automata, 
 but there is no slowdown either (I checked on one machine only for now, but 
 will verify on other machines too).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Grant Ingersoll (JIRA)
UIMA jars are missing version numbers
-

 Key: SOLR-2427
 URL: https://issues.apache.org/jira/browse/SOLR-2427
 Project: Solr
  Issue Type: Bug
Reporter: Grant Ingersoll
Priority: Trivial


We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-15 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-2952:


Attachment: LUCENE-2952.patch

Pretty close to standalone completion.  Next step to hook it in.  I'm going to 
commit the license naming normalization now but not the validation code yet.

Also, renamed LicenseChecker to DependencyChecker as it might be useful for 
checking other things like that all jars have version numbers.

 Make license checking/maintenance easier/automated
 --

 Key: LUCENE-2952
 URL: https://issues.apache.org/jira/browse/LUCENE-2952
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor
 Attachments: LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch


 Instead of waiting until release to check licenses are valid, we should make 
 it a part of our build process to ensure that all dependencies have proper 
 licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006930#comment-13006930
 ] 

Robert Muir commented on SOLR-2427:
---

I agree, i think best would be to format them like the others in solr: for 
example commons-csv-1.0-SNAPSHOT-r966014.jar

 UIMA jars are missing version numbers
 -

 Key: SOLR-2427
 URL: https://issues.apache.org/jira/browse/SOLR-2427
 Project: Solr
  Issue Type: Bug
Reporter: Grant Ingersoll
Priority: Trivial

 We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2428) Upgrade carrot2-core dependency to a version with a Java 1.5-compiled jar

2011-03-15 Thread Steven Rowe (JIRA)
Upgrade carrot2-core dependency to a version with a Java 1.5-compiled jar
-

 Key: SOLR-2428
 URL: https://issues.apache.org/jira/browse/SOLR-2428
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Clustering
Affects Versions: 3.1.1, 3.2
Reporter: Steven Rowe
Priority: Minor
 Fix For: 3.1.1, 3.2


As of not-yet-released version 3.4.4, the carrot2-core will publish a retowoven 
1.5 version of the jar - see Dawid Weiss's comment on 
[LUCENE-2957|https://issues.apache.org/jira/browse/LUCENE-2957?focusedCommentId=13006878page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13006878]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006936#comment-13006936
 ] 

Tommaso Teofili commented on SOLR-2427:
---

The mentioned jars have the following versions and revisions:
- uima-core.jar is 2.3.1 (released)
- uima-an-alchemy.jar is 2.3.1-SNAPSHOT revision 1062868
- uima-an-calais.jaris 2.3.1-SNAPSHOT revision 1062868
- uima-an-tagger.jar is 2.3.1-SNAPSHOT revision 1062868
- uima-an-wst.jar is 2.3.1-SNAPSHOT revision 1076132
Hope this helps

 UIMA jars are missing version numbers
 -

 Key: SOLR-2427
 URL: https://issues.apache.org/jira/browse/SOLR-2427
 Project: Solr
  Issue Type: Bug
Reporter: Grant Ingersoll
Priority: Trivial

 We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2957) generate-maven-artifacts target should include all non-Mavenized Lucene Solr dependencies

2011-03-15 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006937#comment-13006937
 ] 

Steven Rowe commented on LUCENE-2957:
-

Thanks Dawid - I've created SOLR-2428 to track upgrading once 3.4.4 has been 
released.

 generate-maven-artifacts target should include all non-Mavenized Lucene  
 Solr dependencies
 ---

 Key: LUCENE-2957
 URL: https://issues.apache.org/jira/browse/LUCENE-2957
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1, 3.2, 4.0
Reporter: Steven Rowe
Assignee: Steven Rowe
Priority: Minor
 Fix For: 3.1, 3.2, 4.0

 Attachments: LUCENE-2923-part3.patch, LUCENE-2957-part2.patch, 
 LUCENE-2957.patch


 Currently, in addition to deploying artifacts for all of the Lucene and Solr 
 modules to a repository (by default local), the {{generate-maven-artifacts}} 
 target also deploys artifacts for the following non-Mavenized Solr 
 dependencies (lucene_solr_3_1 version given here):
 # {{solr/lib/commons-csv-1.0-SNAPSHOT-r966014.jar}} as 
 org.apache.solr:solr-commons-csv:3.1
 # {{solr/lib/apache-solr-noggit-r944541.jar}} as 
 org.apache.solr:solr-noggit:3.1
 \\ \\
 The following {{.jar}}'s should be added to the above list (lucene_solr_3_1 
 version given here):
 \\ \\
 # {{lucene/contrib/icu/lib/icu4j-4_6.jar}}
 # 
 {{lucene/contrib/benchmark/lib/xercesImpl-2.9.1-patched-XERCESJ}}{{-1257.jar}}
 # {{solr/contrib/clustering/lib/carrot2-core-3.4.2.jar}}**
 # {{solr/contrib/uima/lib/uima-an-alchemy.jar}}
 # {{solr/contrib/uima/lib/uima-an-calais.jar}}
 # {{solr/contrib/uima/lib/uima-an-tagger.jar}}
 # {{solr/contrib/uima/lib/uima-an-wst.jar}}
 # {{solr/contrib/uima/lib/uima-core.jar}}
 \\ \\
 I think it makes sense to follow the same model as the current non-Mavenized 
 dependencies:
 \\ \\
 * {{groupId}} = {{org.apache.solr/.lucene}}
 * {{artifactId}} = {{solr-/lucene-}}original-name,
 * {{version}} = lucene-solr-release-version.
 **The carrot2-core jar doesn't need to be included in trunk's release 
 artifacts, since there already is a Mavenized Java6-compiled jar.  branch_3x 
 and lucene_solr_3_1 will need this Solr-specific Java5-compiled maven 
 artifact, though.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Assigned: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe reassigned SOLR-2427:
-

Assignee: Steven Rowe

 UIMA jars are missing version numbers
 -

 Key: SOLR-2427
 URL: https://issues.apache.org/jira/browse/SOLR-2427
 Project: Solr
  Issue Type: Bug
Reporter: Grant Ingersoll
Assignee: Steven Rowe
Priority: Trivial

 We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Assigned: (SOLR-2428) Upgrade carrot2-core dependency to a version with a Java 1.5-compiled jar

2011-03-15 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss reassigned SOLR-2428:
-

Assignee: Dawid Weiss

 Upgrade carrot2-core dependency to a version with a Java 1.5-compiled jar
 -

 Key: SOLR-2428
 URL: https://issues.apache.org/jira/browse/SOLR-2428
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Clustering
Affects Versions: 3.1.1, 3.2
Reporter: Steven Rowe
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 3.1.1, 3.2


 As of not-yet-released version 3.4.4, the carrot2-core will publish a 
 retowoven 1.5 version of the jar - see Dawid Weiss's comment on 
 [LUCENE-2957|https://issues.apache.org/jira/browse/LUCENE-2957?focusedCommentId=13006878page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13006878]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2428) Upgrade carrot2-core dependency to a version with a Java 1.5-compiled jar

2011-03-15 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated SOLR-2428:
--

Description: As of not-yet-released version 3.4.4, the carrot2-core jar 
will be published as a retrowoven 1.5 version (in addition to a 
Java-1.6-compiled version) - see Dawid Weiss's comment on 
[LUCENE-2957|https://issues.apache.org/jira/browse/LUCENE-2957?focusedCommentId=13006878page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13006878]
  (was: As of not-yet-released version 3.4.4, the carrot2-core will publish a 
retowoven 1.5 version of the jar - see Dawid Weiss's comment on 
[LUCENE-2957|https://issues.apache.org/jira/browse/LUCENE-2957?focusedCommentId=13006878page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13006878])

 Upgrade carrot2-core dependency to a version with a Java 1.5-compiled jar
 -

 Key: SOLR-2428
 URL: https://issues.apache.org/jira/browse/SOLR-2428
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Clustering
Affects Versions: 3.1.1, 3.2
Reporter: Steven Rowe
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 3.1.1, 3.2


 As of not-yet-released version 3.4.4, the carrot2-core jar will be published 
 as a retrowoven 1.5 version (in addition to a Java-1.6-compiled version) - 
 see Dawid Weiss's comment on 
 [LUCENE-2957|https://issues.apache.org/jira/browse/LUCENE-2957?focusedCommentId=13006878page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13006878]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006942#comment-13006942
 ] 

Steven Rowe commented on SOLR-2427:
---

Thanks Tommaso, I will rename them.

Separately, although you previously said that uima-core.jar is the released 
2.3.1 version, I still had been thinking that along with the other UIMA jars, 
its maven artifact should be published under the Apache Solr project.  That 
makes little sense, though, now that I have reconsidered it, so I'll drop maven 
publishing of the Solr-specific uima-core jar.  The other UIMA SNAPSHOT 
dependencies, however, will need to be published as Solr-specific versions, 
since the maven central repository rejects POMs with SNAPSHOT dependencies.

 UIMA jars are missing version numbers
 -

 Key: SOLR-2427
 URL: https://issues.apache.org/jira/browse/SOLR-2427
 Project: Solr
  Issue Type: Bug
Reporter: Grant Ingersoll
Assignee: Steven Rowe
Priority: Trivial

 We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-15 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-2952:


Attachment: LUCENE-2952.patch

This hooks it into compile-core, but has the unfortunate side-effect of being 
called a whole bunch of times, which is not good.  Need to read up on how to 
avoid that in ant (or if anyone has suggestions, that would be great).

Otherwise, I think the baseline functionality is ready to go.

 Make license checking/maintenance easier/automated
 --

 Key: LUCENE-2952
 URL: https://issues.apache.org/jira/browse/LUCENE-2952
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor
 Attachments: LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, 
 LUCENE-2952.patch


 Instead of waiting until release to check licenses are valid, we should make 
 it a part of our build process to ensure that all dependencies have proper 
 licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006946#comment-13006946
 ] 

Tommaso Teofili commented on SOLR-2427:
---

bq.  That makes little sense, though, now that I have reconsidered it, so I'll 
drop maven publishing of the Solr-specific uima-core jar. The other UIMA 
SNAPSHOT dependencies, however, will need to be published as Solr-specific 
versions, since the maven central repository rejects POMs with SNAPSHOT 
dependencies.

+1 :)

 UIMA jars are missing version numbers
 -

 Key: SOLR-2427
 URL: https://issues.apache.org/jira/browse/SOLR-2427
 Project: Solr
  Issue Type: Bug
Reporter: Grant Ingersoll
Assignee: Steven Rowe
Priority: Trivial

 We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006950#comment-13006950
 ] 

Steven Rowe commented on SOLR-2427:
---

Hmm, [uimaj-core-2.3.1.jar in the maven 
repository|http://repo1.maven.org/maven2/org/apache/uima/uimaj-core/2.3.1/] was 
compiled with Java 1.6, while the version in {{solr/contrib/uima/lib/}} was 
compiled with Java 1.5.  Tommaso, do you know of a maven-hosted 
Java-1.5-compiled version of the uima-core jar?  If not, I will leave things as 
they are now, continuing to publish a Solr-specific Java-1.5-compiled uima-core 
jar.

 UIMA jars are missing version numbers
 -

 Key: SOLR-2427
 URL: https://issues.apache.org/jira/browse/SOLR-2427
 Project: Solr
  Issue Type: Bug
Reporter: Grant Ingersoll
Assignee: Steven Rowe
Priority: Trivial

 We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006951#comment-13006951
 ] 

Tommaso Teofili commented on SOLR-2427:
---

That is unexpected as UIMA should've been deployed with 1.5. I'll check this 
out as soon as I can.

 UIMA jars are missing version numbers
 -

 Key: SOLR-2427
 URL: https://issues.apache.org/jira/browse/SOLR-2427
 Project: Solr
  Issue Type: Bug
Reporter: Grant Ingersoll
Assignee: Steven Rowe
Priority: Trivial

 We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006952#comment-13006952
 ] 

Steven Rowe commented on SOLR-2427:
---

Crap, I got the uima-core situation exactly backward.

The version in {{solr/contrib/uima/lib/}} was compiled, by you, Tommaso, using 
Java 1.6 (according to {{META-INF/MANIFEST.MF}}).  However, since the 
clustering contrib tests succeed under Java 1.5, I assume that although the jar 
was compiled using Java 1.6, the target version was 1.5.

The version in the maven central repository was actually compiled with 1.5 
(again, according to {{META-INF/MANIFEST.MF}}).

Tommaso, why is the version in Solr's source tree different from the maven 
version of the jar?

 UIMA jars are missing version numbers
 -

 Key: SOLR-2427
 URL: https://issues.apache.org/jira/browse/SOLR-2427
 Project: Solr
  Issue Type: Bug
Reporter: Grant Ingersoll
Assignee: Steven Rowe
Priority: Trivial

 We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006960#comment-13006960
 ] 

Steven Rowe commented on SOLR-2427:
---

It looks to me like the UIMA contrib was committed before uima-core 2.3.1 was 
released, using a 2.3.1-SNAPSHOT version of the jar, and then never upgraded 
after the release.

I think it makes sense to switch the version of the uima-core jar in Solr's 
source tree to the released 2.3.1 version, and then stop publishing a 
Solr-specific uima-core jar.

 UIMA jars are missing version numbers
 -

 Key: SOLR-2427
 URL: https://issues.apache.org/jira/browse/SOLR-2427
 Project: Solr
  Issue Type: Bug
Reporter: Grant Ingersoll
Assignee: Steven Rowe
Priority: Trivial

 We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated SOLR-2427:
--

 Priority: Blocker  (was: Trivial)
Affects Version/s: 3.1
Fix Version/s: 3.1

 UIMA jars are missing version numbers
 -

 Key: SOLR-2427
 URL: https://issues.apache.org/jira/browse/SOLR-2427
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.1
Reporter: Grant Ingersoll
Assignee: Steven Rowe
Priority: Blocker
 Fix For: 3.1


 We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2011-03-15 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007003#comment-13007003
 ] 

Michael McCandless commented on LUCENE-2573:


bq. it currently holds the ram usage for that DWPT when it was checked out so 
that I can reduce the flushBytes accordingly. We can maybe get rid of it 
entirely but I don't want to rely on the DWPT bytesUsed() though.

Hmm, but, once a DWPT is pulled from production, its bytesUsed()
should not be changing anymore?  Like why can't we use it to hold its
bytesUsed?

bq. I generally don't like cluttering DocWriter and let it grow like IW. 
DocWriterSession might not be the ideal name for this class but its really a 
ram tracker for this DW. Yet, we can move out some parts that do not directly 
relate to mem tracking. Maybe DocWriterBytes?

Well DocWriter is quite small now :) (On RT branch).  And adding
another class means we have to be careful about proper sync'ing (lock
order, to avoid deadlock)... and I think it should get smaller if we
can remove state[] array, FlushState enum, etc. but, OK I guess we can
leave it as separate for now.  How about DocumentsWriterRAMUsage?
RAMTracker?

{quote}
bq. Instead of FlushPolicy.message, can't the policy call DW.message?

I don't want to couple that API to DW. What would be the benefit beside from 
saving a single method?
{quote}

Hmm, good point.  Though, it already has a SetOnceDocumentsWriter --
how come?  Can the policy call IW.message?  I just think FlushPolicy
ought to be very lean, ie show you exactly what you need to
implement...

{quote}
bq. On the by-RAM flush policies... when you hit the high water mark, we
should 
1) flush all DWPTs and 2) stall any other threads.

Well I am not sure if we should do that. I don't really see why we should 
forcefully stop the world here. Incoming threads will pick up a flush 
immediately and if we have enough resources to index further why should we wait 
until all DWPT are flushed. if we stall I fear that we could queue up threads 
that could help flushing while stalling would simply stop them doing anything, 
right? You can still control this with the healthiness though. We currently do 
flush all DWPT btw. once we hit the HW.
{quote}

As long as we default the high mark to something generous (2X low
mark), I think this approach should work well.

Ie, we begin flushing as soon as low mark is crossed on active RAM.
We pick the biggest DWPT and take it of rotation, and immediately
deduct its RAM usage from the active pool.  If, while we are still
flushing, active RAM again grows above the low mark, then we pull
another DWPT, etc.  But then if ever the total flushing + active
exceeds the high mark, we stall.

BTW why do we track flushPending RAM vs flushing RAM?  Is that
distinction necessary?  (Can't we just track flushing RAM?).


 Tiered flushing of DWPTs by RAM with low/high water marks
 -

 Key: LUCENE-2573
 URL: https://issues.apache.org/jira/browse/LUCENE-2573
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Simon Willnauer
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
 LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch


 Now that we have DocumentsWriterPerThreads we need to track total consumed 
 RAM across all DWPTs.
 A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
 tiered approach:  
 - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
 - Flush all DWPTs at a high water mark (e.g. at 110%)
 - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
 used, flush at 90%, 95%, 100%, 105% and 110%.
 Should we allow the user to configure the low and high water mark values 
 explicitly using total values (e.g. low water mark at 120MB, high water mark 
 at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
 config method and use something like 90% and 110% for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-15 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007011#comment-13007011
 ] 

Michael McCandless commented on LUCENE-2960:


bq. Hmmm, infoStream is just for debugging... should we really make it volatile?

I'll remove its volatile...

{quote}
bq. IWC cannot be made immutable – you build it up incrementally (new 
IWC(...).setThis(...).setThat(...)). Its fields cannot be final.

Setters can return modified immutable copy of 'this'. So you get both 
incremental building and immutability.
{quote}

Oh yeah.  But then we'd clone the full IWC on every set... this seems
like overkill in the name of purity.

{quote}
What about earlier compromise mentioned by Shay, Mark, me? Keep setters for 
'live' properties on IW.
This clearly draws the line, and you don't have to consult Javadocs for each 
and every setting to know if you can change it live or not.
{quote}

I really don't like that this approach would split IW configuration
into two places.  Like you look at the javadocs for IWC and think that
you cannot change the RAM buffer size.

IWC should be the one place you go to see which settings you can
change about the IW.  That some of these settings take effect live
while others do not is really an orthogonal (and I think, secondary,
ie handled fine w/ jdocs) aspect/concern.


 Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
 --

 Key: LUCENE-2960
 URL: https://issues.apache.org/jira/browse/LUCENE-2960
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shay Banon
Priority: Blocker
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2960.patch


 In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. 
 It would be great to be able to control that on a live IndexWriter. Other 
 possible two methods that would be great to bring back are 
 setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other 
 setters can actually be set on the MergePolicy itself, so no need for setters 
 for those (I think).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2967) Use linear probing with an additional good bit avalanching function in FST's NodeHash.

2011-03-15 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007031#comment-13007031
 ] 

Michael McCandless commented on LUCENE-2967:


Hmm, unfortunately, I'm seeing the patch make FST building slower, at
least in my env/test set.  I built FST for the 38M wikipedia terms.

I ran 6 times each, alternating trunk  patch.

I also turned off saving the FST, and ran -noverify, so I'm only
measuring time to build it.  I run java -Xmx2g -Xms2g -Xbatch, and
measure wall clock time.

Times on trunk (seconds):

{noformat}
  43.795
  43.493
  44.343
  44.045
  43.645
  43.846
{noformat}

Times w/ patch:

{noformat}
  46.595
  47.751
  47.901
  47.901
  47.901
  47.700
{noformat}

We could also try less generous load factors...


 Use linear probing with an additional good bit avalanching function in FST's 
 NodeHash.
 --

 Key: LUCENE-2967
 URL: https://issues.apache.org/jira/browse/LUCENE-2967
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Trivial
 Fix For: 4.0

 Attachments: LUCENE-2967.patch


 I recently had an interesting discussion with Sebastiano Vigna (fastutil), 
 who suggested that linear probing, given a hash mixing function with good 
 avalanche properties, is a way better method of constructing lookups in 
 associative arrays compared to quadratic probing. Indeed, with linear probing 
 you can implement removals from a hash map without removed slot markers and 
 linear probing has nice properties with respect to modern CPUs (caches). I've 
 reimplemented HPPC's hash maps to use linear probing and we observed a nice 
 speedup (the same applies for fastutils of course).
 This patch changes NodeHash's implementation to use linear probing. The code 
 is a bit simpler (I think :). I also moved the load factor to a constant -- 
 0.5 seems like a generous load factor, especially if we allow large FSTs to 
 be built. I don't see any significant speedup in constructing large automata, 
 but there is no slowdown either (I checked on one machine only for now, but 
 will verify on other machines too).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-15 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007036#comment-13007036
 ] 

Mark Miller commented on LUCENE-2960:
-

{quote}I really don't like that this approach would split IW configuration
into two places.  Like you look at the javadocs for IWC and think that
you cannot change the RAM buffer size.

IWC should be the one place you go to see which settings you can
change about the IW.  That some of these settings take effect live
while others do not is really an orthogonal (and I think, secondary,
ie handled fine w/ jdocs) aspect/concern.{quote}

You can just as easily argue that the javadocs for IWC could explain that live 
settings are on the IW.

That pattern just smells wrong. 

{quote}
But, if you want to change something live, you can
IW.getConfig().setFoo(...). The config instance is a private clone to
that IW.
{quote}

This is better than nothing.

Another thought is to offer all settings on the IWC for init convenience and 
exposure and then add javadoc about updaters on IW for those settings that can 
be changed on the fly - or one update method and enums...

 Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
 --

 Key: LUCENE-2960
 URL: https://issues.apache.org/jira/browse/LUCENE-2960
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shay Banon
Priority: Blocker
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2960.patch


 In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. 
 It would be great to be able to control that on a live IndexWriter. Other 
 possible two methods that would be great to bring back are 
 setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other 
 setters can actually be set on the MergePolicy itself, so no need for setters 
 for those (I think).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-15 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007043#comment-13007043
 ] 

Steven Rowe commented on LUCENE-2960:
-

How about an IWC base class, extended by IWCinit and IWClive.  IWCinit has 
setters for everything, and IW.getConfig() returns IWClive, which has no 
setters for things you can't set on the fly.

 Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
 --

 Key: LUCENE-2960
 URL: https://issues.apache.org/jira/browse/LUCENE-2960
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shay Banon
Priority: Blocker
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2960.patch


 In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. 
 It would be great to be able to control that on a live IndexWriter. Other 
 possible two methods that would be great to bring back are 
 setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other 
 setters can actually be set on the MergePolicy itself, so no need for setters 
 for those (I think).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe resolved SOLR-2427.
---

   Resolution: Fixed
Fix Version/s: 4.0
   3.2

Committed:
- lucene_solr_3_1 revision 1081856
- branch_3x revision 1081860
- trunk revision 1081880

Ant build  tests succeed.  Maven build  tests succeed.  {{ant -Dversion=... 
-Dspecversion=... prepare-release sign-artifacts}} works and the generated 
Maven artifacts look good.

 UIMA jars are missing version numbers
 -

 Key: SOLR-2427
 URL: https://issues.apache.org/jira/browse/SOLR-2427
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.1
Reporter: Grant Ingersoll
Assignee: Steven Rowe
Priority: Blocker
 Fix For: 3.1, 3.2, 4.0


 We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-15 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007048#comment-13007048
 ] 

Earwin Burrfoot commented on LUCENE-2960:
-

bq. Oh yeah. But then we'd clone the full IWC on every set... this seems like 
overkill in the name of purity.
So what? What exactly is overkill? Few wasted bytes and CPU ns for an object 
that's created a couple of times during application lifetime?
There are also builders, which are very similar to what Steven is proposing.

bq. Another thought is to offer all settings on the IWC for init convenience 
and exposure and then add javadoc about updaters on IW for those settings that 
can be changed on the fly
That's exactly how I'd like to see it.

 Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
 --

 Key: LUCENE-2960
 URL: https://issues.apache.org/jira/browse/LUCENE-2960
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shay Banon
Priority: Blocker
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2960.patch


 In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. 
 It would be great to be able to control that on a live IndexWriter. Other 
 possible two methods that would be great to bring back are 
 setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other 
 setters can actually be set on the MergePolicy itself, so no need for setters 
 for those (I think).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007078#comment-13007078
 ] 

Tommaso Teofili commented on SOLR-2427:
---

Hello Steven,
I found the problem being (damn) silent JVM update in Mac OSX which simlinked 
1.5 Java version to 1.6 :(
However the uima-core version had to be switched to 2.3.1 release (the snapshot 
one was the first jar I uploaded just some days before the release).
Thanks for taking care.


 UIMA jars are missing version numbers
 -

 Key: SOLR-2427
 URL: https://issues.apache.org/jira/browse/SOLR-2427
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.1
Reporter: Grant Ingersoll
Assignee: Steven Rowe
Priority: Blocker
 Fix For: 3.1, 3.2, 4.0


 We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007085#comment-13007085
 ] 

Steven Rowe commented on SOLR-2427:
---

bq. I found the problem being (damn) silent JVM update in Mac OSX which 
simlinked 1.5 Java version to 1.6

Apple rocks!

bq. However the uima-core version had to be switched to 2.3.1 release (the 
snapshot one was the first jar I uploaded just some days before the release).

The manifest in {{solr/contrib/uima/lib/uima-core.jar}} listed the version as 
2.3.1-SNAPSHOT, and when I did a diff with the jar from the maven central repo, 
all of the .class files were different.  So I'm not sure what happened here, 
but the jar in Solr's source tree was definitely not the same as the released 
jar.  Maybe the released 2.3.1 jar you posted was never committed?  I don't 
know.

Anyway, it's fixed now.

bq. Thanks for taking care.

No problem.

 UIMA jars are missing version numbers
 -

 Key: SOLR-2427
 URL: https://issues.apache.org/jira/browse/SOLR-2427
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.1
Reporter: Grant Ingersoll
Assignee: Steven Rowe
Priority: Blocker
 Fix For: 3.1, 3.2, 4.0


 We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2429) ability to not cache a filter

2011-03-15 Thread Yonik Seeley (JIRA)
ability to not cache a filter
-

 Key: SOLR-2429
 URL: https://issues.apache.org/jira/browse/SOLR-2429
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley


A user should be able to add {!cache=false} to a query or filter query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2429) ability to not cache a filter

2011-03-15 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007122#comment-13007122
 ] 

Yonik Seeley commented on SOLR-2429:


The annoying part here is we need more metadata than just Query that we use 
now for a filter.
Unfortunately, SolrIndexSearcher uses ListQuery everywhere.

We could create something like a SolrQuery extends Query that wrapped a normal 
query and added additional metadata (like cache options).  That's a bit messier 
since we'd have instanceof checks and casts everywhere though.

Another option is to create a SolrQuery class that does not extend Query - 
hence methods taking ListQuery would now need to take ListSolrQuery

{code}
class SolrQuery {
  Query q;
  QParser qparser;
  boolean cache;
  ...
}
{code}

Thoughts?

 ability to not cache a filter
 -

 Key: SOLR-2429
 URL: https://issues.apache.org/jira/browse/SOLR-2429
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley

 A user should be able to add {!cache=false} to a query or filter query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007123#comment-13007123
 ] 

Robert Muir commented on LUCENE-2960:
-

Its exactly the lack of consensus we see here, thats why I am 100% against 
having the setter approach.

I'm totally against some deprecation/undeprecation loop because we in future 
releases another setting
wants to be live.

It seems the only way we can avoid this, is for javadoc to be the only 
specification as to whether a setting
does or does not take effect live.


 Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
 --

 Key: LUCENE-2960
 URL: https://issues.apache.org/jira/browse/LUCENE-2960
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shay Banon
Priority: Blocker
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2960.patch


 In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. 
 It would be great to be able to control that on a live IndexWriter. Other 
 possible two methods that would be great to bring back are 
 setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other 
 setters can actually be set on the MergePolicy itself, so no need for setters 
 for those (I think).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-15 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007136#comment-13007136
 ] 

Earwin Burrfoot commented on LUCENE-2960:
-

You avoid deprecation/undeprecation and binary incompatibility, while 
incompatibly changing semantics. What do you win?

 Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
 --

 Key: LUCENE-2960
 URL: https://issues.apache.org/jira/browse/LUCENE-2960
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shay Banon
Priority: Blocker
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2960.patch


 In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. 
 It would be great to be able to control that on a live IndexWriter. Other 
 possible two methods that would be great to bring back are 
 setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other 
 setters can actually be set on the MergePolicy itself, so no need for setters 
 for those (I think).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007139#comment-13007139
 ] 

Robert Muir commented on LUCENE-2960:
-

You win the fact that this is such an expert thing, and it should not confuse 
99% of users who won't need to change these settings in a live way.

This is a central API to using lucene, sorry i would rather see IWConfig be 
reverted completely than see this deprecation/undeprecation loop, it would just 
cause too much confusion.


 Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
 --

 Key: LUCENE-2960
 URL: https://issues.apache.org/jira/browse/LUCENE-2960
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shay Banon
Priority: Blocker
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2960.patch


 In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. 
 It would be great to be able to control that on a live IndexWriter. Other 
 possible two methods that would be great to bring back are 
 setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other 
 setters can actually be set on the MergePolicy itself, so no need for setters 
 for those (I think).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2967) Use linear probing with an additional good bit avalanching function in FST's NodeHash.

2011-03-15 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007164#comment-13007164
 ] 

Dawid Weiss commented on LUCENE-2967:
-

Yes, now I see this difference on the 38M too:

trunk:
{noformat}
56.462
55.725
55.544
55.522
{noformat}
w/patch:
{noformat}
59.9
59.6
{noformat}

I'll see if I can find out the problem here; I assume the collision ratio 
should be nearly identical... but who knows. This is of no priority, but 
interesting stuff. I'll close if I can't get it better than the trunk version.

 Use linear probing with an additional good bit avalanching function in FST's 
 NodeHash.
 --

 Key: LUCENE-2967
 URL: https://issues.apache.org/jira/browse/LUCENE-2967
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Trivial
 Fix For: 4.0

 Attachments: LUCENE-2967.patch


 I recently had an interesting discussion with Sebastiano Vigna (fastutil), 
 who suggested that linear probing, given a hash mixing function with good 
 avalanche properties, is a way better method of constructing lookups in 
 associative arrays compared to quadratic probing. Indeed, with linear probing 
 you can implement removals from a hash map without removed slot markers and 
 linear probing has nice properties with respect to modern CPUs (caches). I've 
 reimplemented HPPC's hash maps to use linear probing and we observed a nice 
 speedup (the same applies for fastutils of course).
 This patch changes NodeHash's implementation to use linear probing. The code 
 is a bit simpler (I think :). I also moved the load factor to a constant -- 
 0.5 seems like a generous load factor, especially if we allow large FSTs to 
 be built. I don't see any significant speedup in constructing large automata, 
 but there is no slowdown either (I checked on one machine only for now, but 
 will verify on other machines too).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-15 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007205#comment-13007205
 ] 

Steven Rowe commented on LUCENE-2960:
-

bq. How about an IWC base class, extended by IWCinit and IWClive. IWCinit has 
setters for everything, and IW.getConfig() returns IWClive, which has no 
setters for things you can't set on the fly.

I tried to implement this, but couldn't figure out a way to avoid code and 
javadoc duplication and/or separation for the live setters, which need to be on 
both the init and live versions.  Duplication/separation of this sort would be 
begging for trouble.  (The live setters can't be on the base class because the 
init and live versions would have to return different types to allow for proper 
chaining.)

 Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
 --

 Key: LUCENE-2960
 URL: https://issues.apache.org/jira/browse/LUCENE-2960
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shay Banon
Priority: Blocker
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2960.patch


 In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. 
 It would be great to be able to control that on a live IndexWriter. Other 
 possible two methods that would be great to bring back are 
 setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other 
 setters can actually be set on the MergePolicy itself, so no need for setters 
 for those (I think).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-15 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-2952:


Attachment: LUCENE-2952.patch

This minimizes the number of calls to validate (there is still one extra call 
via the benchmark module since it invokes the common lucene compile target).  
Also splits it out into Lucene, Solr and Modules.

I'd consider it close to good enough at this point.

 Make license checking/maintenance easier/automated
 --

 Key: LUCENE-2952
 URL: https://issues.apache.org/jira/browse/LUCENE-2952
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor
 Attachments: LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, 
 LUCENE-2952.patch, LUCENE-2952.patch


 Instead of waiting until release to check licenses are valid, we should make 
 it a part of our build process to ensure that all dependencies have proper 
 licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2429) ability to not cache a filter

2011-03-15 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007217#comment-13007217
 ] 

Hoss Man commented on SOLR-2429:


why not extend Query? ... it could actually rewrite to the Query it wraps, 
giving us the best of both worlds.

FWIW: it also seems like it would make sense for this type of syntax/decoration 
to work with the q param (skipping the queryResultCache)

 ability to not cache a filter
 -

 Key: SOLR-2429
 URL: https://issues.apache.org/jira/browse/SOLR-2429
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley

 A user should be able to add {!cache=false} to a query or filter query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [HUDSON] Lucene-Solr-tests-only-3.x - Build # 5964 - Failure

2011-03-15 Thread Steven A Rowe
The build never made it past the initial pre-build ant clean:

---
clean:
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/build

BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/build.xml:114:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/common-build.xml:191:
 Unable to delete file 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/build/backwards/test/6/index.TestLockFactory6.-6904310916879757798/_2b.fdx
  
---


 -Original Message-
 From: Apache Hudson Server [mailto:hud...@hudson.apache.org]
 Sent: Tuesday, March 15, 2011 5:56 PM
 To: dev@lucene.apache.org
 Subject: [HUDSON] Lucene-Solr-tests-only-3.x - Build # 5964 - Failure
 
 Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-
 3.x/5964/
 
 6 tests failed.
 FAILED:  TEST-org.apache.lucene.index.TestIndexWriter.xml.init
 
 Error Message:
 
 
 Stack Trace:
 Test report file /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-
 only-3.x/checkout/lucene/build/backwards/test/TEST-
 org.apache.lucene.index.TestIndexWriter.xml was length 0
 
 FAILED:  TEST-org.apache.lucene.search.TestBoolean2.xml.init
 
 Error Message:
 
 
 Stack Trace:
 Test report file /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-
 only-3.x/checkout/lucene/build/backwards/test/TEST-
 org.apache.lucene.search.TestBoolean2.xml was length 0
 
 REGRESSION:  org.apache.lucene.store.TestLockFactory.testStressLocks
 
 Error Message:
 IndexWriter hit unexpected exceptions
 
 Stack Trace:
 junit.framework.AssertionFailedError: IndexWriter hit unexpected
 exceptions
   at
 org.apache.lucene.store.TestLockFactory._testStressLocks(TestLockFactory.j
 ava:172)
   at
 org.apache.lucene.store.TestLockFactory.testStressLocks(TestLockFactory.ja
 va:142)
   at
 org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:255)
 
 
 FAILED:  init.org.apache.lucene.store.TestRAMDirectory
 
 Error Message:
 org.apache.lucene.store.TestRAMDirectory
 
 Stack Trace:
 java.lang.ClassNotFoundException: org.apache.lucene.store.TestRAMDirectory
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:186)
 
 
 FAILED:  init.org.apache.lucene.util.TestNumericUtils
 
 Error Message:
 org.apache.lucene.util.TestNumericUtils
 
 Stack Trace:
 java.lang.ClassNotFoundException: org.apache.lucene.util.TestNumericUtils
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:186)
 
 
 FAILED:  init.org.apache.lucene.util.TestSmallFloat
 
 Error Message:
 org.apache.lucene.util.TestSmallFloat
 
 Stack Trace:
 java.lang.ClassNotFoundException: org.apache.lucene.util.TestSmallFloat
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:186)
 
 
 
 
 Build Log (for compile errors):
 [...truncated 47 lines...]
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2968) SurroundQuery doesn't support SpanNot

2011-03-15 Thread Grant Ingersoll (JIRA)
SurroundQuery doesn't support SpanNot
-

 Key: LUCENE-2968
 URL: https://issues.apache.org/jira/browse/LUCENE-2968
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor


It would be nice if we could do span not in the surround query, as they are 
quite useful for keeping searches within a boundary (say a sentence)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2749) Co-occurrence filter

2011-03-15 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007229#comment-13007229
 ] 

Steven Rowe commented on LUCENE-2749:
-

bq. this filter would definitely something that i could use

What use case(s) are you thinking of?

 Co-occurrence filter
 

 Key: LUCENE-2749
 URL: https://issues.apache.org/jira/browse/LUCENE-2749
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Analysis
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Priority: Minor
 Fix For: 4.0


 The co-occurrence filter to be developed here will output sets of tokens that 
 co-occur within a given window onto a token stream.  
 These token sets can be ordered either lexically (to allow order-independent 
 matching/counting) or positionally (e.g. sliding windows of positionally 
 ordered co-occurring terms that include all terms in the window are called 
 n-grams or shingles). 
 The parameters to this filter will be: 
 * window size: this can be a fixed sequence length, sentence/paragraph 
 context (these will require sentence/paragraph segmentation, which is not in 
 Lucene yet), or over the entire token stream (full field width)
 * minimum number of co-occurring terms: = 2
 * maximum number of co-occurring terms: = window size
 * token set ordering (lexical or positional)
 One use case for co-occurring token sets is as candidates for collocations.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1499) SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ

2011-03-15 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007236#comment-13007236
 ] 

Ahmet Arslan commented on SOLR-1499:


Hi,

Can i use this to upgrade solr version? Where the lucene/solr indices are not 
compatible?

Thanks,
Ahmet

 SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via 
 SolrJ
 -

 Key: SOLR-1499
 URL: https://issues.apache.org/jira/browse/SOLR-1499
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Reporter: Lance Norskog
Assignee: Erik Hatcher
 Fix For: Next

 Attachments: SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch, 
 SOLR-1499.patch, SOLR-1499.patch


 The SolrEntityProcessor queries an external Solr instance. The Solr documents 
 returned are unpacked and emitted as DIH fields.
 The SolrEntityProcessor uses the following attributes:
 * solr='http://localhost:8983/solr/sms'
 ** This gives the URL of the target Solr instance.
 *** Note: the connection to the target Solr uses the binary SolrJ format.
 * query='Jeffersonsort=id+asc'
 ** This gives the base query string use with Solr. It can include any 
 standard Solr request parameter. This attribute is processed under the 
 variable resolution rules and can be driven in an inner stage of the indexing 
 pipeline.
 * rows='10'
 ** This gives the number of rows to fetch per request..
 ** The SolrEntityProcessor always fetches every document that matches the 
 request..
 * fields='id,tag'
 ** This selects the fields to be returned from the Solr request.
 ** These must also be declared as field elements.
 ** As with all fields, template processors can be used to alter the contents 
 to be passed downwards.
 * timeout='30'
 ** This limits the query to 5 seconds. This can be used as a fail-safe to 
 prevent the indexing session from freezing up. By default the timeout is 5 
 minutes.
 Limitations:
 * Solr errors are not handled correctly.
 * Loop control constructs have not been tested.
 * Multi-valued returned fields have not been tested.
 The unit tests give examples of how to use it as the root entity and an inner 
 entity.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2429) ability to not cache a filter

2011-03-15 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007265#comment-13007265
 ] 

Ryan McKinley commented on SOLR-2429:
-

I'm not sure this is related -- it could be -- I'm looking writing a custom 
query from:
{code:java}
  @Override
  public Query getFieldQuery(QParser parser, SchemaField field, String 
externalVal)
{code}

and it would be great to know if this is used as a filter or not -- should it 
include scoring?  Are there ways to build the query where parts are cached and 
some is not?  



 ability to not cache a filter
 -

 Key: SOLR-2429
 URL: https://issues.apache.org/jira/browse/SOLR-2429
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley

 A user should be able to add {!cache=false} to a query or filter query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2969) fix two stopwords typos

2011-03-15 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2969:


Attachment: LUCENE-2969.patch

 fix two stopwords typos
 ---

 Key: LUCENE-2969
 URL: https://issues.apache.org/jira/browse/LUCENE-2969
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-2969.patch


 See:
 http://svn.tartarus.org/snowball?view=revrevision=543
 http://permalink.gmane.org/gmane.comp.search.snowball/1249

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2969) fix two stopwords typos

2011-03-15 Thread Robert Muir (JIRA)
fix two stopwords typos
---

 Key: LUCENE-2969
 URL: https://issues.apache.org/jira/browse/LUCENE-2969
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-2969.patch

See:

http://svn.tartarus.org/snowball?view=revrevision=543
http://permalink.gmane.org/gmane.comp.search.snowball/1249


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1499) SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ

2011-03-15 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007314#comment-13007314
 ] 

Lance Norskog commented on SOLR-1499:
-

Yes you can!

* The source index has to store all of the fields.
* I would do a series of short queries rather than one long one.

Thank you for thinking of this.

 SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via 
 SolrJ
 -

 Key: SOLR-1499
 URL: https://issues.apache.org/jira/browse/SOLR-1499
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Reporter: Lance Norskog
Assignee: Erik Hatcher
 Fix For: Next

 Attachments: SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch, 
 SOLR-1499.patch, SOLR-1499.patch


 The SolrEntityProcessor queries an external Solr instance. The Solr documents 
 returned are unpacked and emitted as DIH fields.
 The SolrEntityProcessor uses the following attributes:
 * solr='http://localhost:8983/solr/sms'
 ** This gives the URL of the target Solr instance.
 *** Note: the connection to the target Solr uses the binary SolrJ format.
 * query='Jeffersonsort=id+asc'
 ** This gives the base query string use with Solr. It can include any 
 standard Solr request parameter. This attribute is processed under the 
 variable resolution rules and can be driven in an inner stage of the indexing 
 pipeline.
 * rows='10'
 ** This gives the number of rows to fetch per request..
 ** The SolrEntityProcessor always fetches every document that matches the 
 request..
 * fields='id,tag'
 ** This selects the fields to be returned from the Solr request.
 ** These must also be declared as field elements.
 ** As with all fields, template processors can be used to alter the contents 
 to be passed downwards.
 * timeout='30'
 ** This limits the query to 5 seconds. This can be used as a fail-safe to 
 prevent the indexing session from freezing up. By default the timeout is 5 
 minutes.
 Limitations:
 * Solr errors are not handled correctly.
 * Loop control constructs have not been tested.
 * Multi-valued returned fields have not been tested.
 The unit tests give examples of how to use it as the root entity and an inner 
 entity.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2429) ability to not cache a filter

2011-03-15 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007324#comment-13007324
 ] 

Otis Gospodnetic commented on SOLR-2429:


I'm with Hoss.  For many months now, I've been dreaming about the possibility 
of telling Solr to execute a query without caching the results.

 ability to not cache a filter
 -

 Key: SOLR-2429
 URL: https://issues.apache.org/jira/browse/SOLR-2429
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley

 A user should be able to add {!cache=false} to a query or filter query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2429) ability to not cache a filter

2011-03-15 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007340#comment-13007340
 ] 

David Smiley commented on SOLR-2429:


Heh, me too!  I was pondering this last night; I know specific queries will 
needlessly pollute the cache.  I was imagining a syntax such as this:  
fq={!cache=no}queryhere

 ability to not cache a filter
 -

 Key: SOLR-2429
 URL: https://issues.apache.org/jira/browse/SOLR-2429
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley

 A user should be able to add {!cache=false} to a query or filter query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] Resolved: (SOLR-2426) Build failing

2011-03-15 Thread Bill Bell
THis started working when I did the following:

#cd C:\Users\bbell\solr
#ant compile
#cd solr
#ant example

If I did a direct ant example it was giving the errors below. I'll
double check my java version too.

On 3/15/11 5:53 AM, Robert Muir (JIRA) j...@apache.org wrote:


 [ 
https://issues.apache.org/jira/browse/SOLR-2426?page=com.atlassian.jira.pl
ugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved SOLR-2426.
---

Resolution: Not A Problem

Trunk requires java 6.

 Build failing
 -

 Key: SOLR-2426
 URL: https://issues.apache.org/jira/browse/SOLR-2426
 Project: Solr
  Issue Type: Bug
Reporter: Bill Bell

 ant clean
 ant example
 trunk
 [javac]  ^
 [javac] 
C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSetHitCo
 llector.java:77: incompatible types
 [javac] found   : org.apache.solr.search.BitDocSet
 [javac] required: org.apache.solr.search.DocSet
 [javac]   return new BitDocSet(bits,pos);
 [javac]  ^
 [javac] 
C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSetHitCo
 llector.java:132: incompatible types
 [javac] found   : org.apache.solr.search.SortedIntDocSet
 [javac] required: org.apache.solr.search.DocSet
 [javac]   return new SortedIntDocSet(scratch, pos);
 [javac]  ^
 [javac] 
C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSetHitCo
 llector.java:136: incompatible types
 [javac] found   : org.apache.solr.search.BitDocSet
 [javac] required: org.apache.solr.search.DocSet
 [javac]   return new BitDocSet(bits,pos);
 [javac]  ^
 [javac] 
C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
 va:26: org.apache.solr.search.DocSlice is not abstract and does not
override abs
 tract method getTopFilter() in org.apache.solr.search.DocSet
 [javac] public class DocSlice extends DocSetBase implements DocList
{
 [javac]^
 [javac] 
C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
 va:54: incompatible types
 [javac] found   : org.apache.solr.search.DocSlice
 [javac] required: org.apache.solr.search.DocList
 [javac] if (this.offset == offset  this.len==len) return this;
 [javac]^
 [javac] 
C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
 va:62: incompatible types
 [javac] found   : org.apache.solr.search.DocSlice
 [javac] required: org.apache.solr.search.DocList
 [javac] if (this.offset == offset  this.len == realLen)
return this;
 [javac] 
 ^
 [javac] 
C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
 va:63: incompatible types
 [javac] found   : org.apache.solr.search.DocSlice
 [javac] required: org.apache.solr.search.DocList
 [javac] return new DocSlice(offset, realLen, docs, scores,
matches, maxS
 core);
 [javac]^
 [javac] 
C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
 va:130: intersection(org.apache.solr.search.DocSet) in
org.apache.solr.search.Do
 cSet cannot be applied to (org.apache.solr.search.DocSlice)
 [javac]   return other.intersection(this);
 [javac]   ^
 [javac] 
C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
 va:139: intersectionSize(org.apache.solr.search.DocSet) in
org.apache.solr.searc
 h.DocSet cannot be applied to (org.apache.solr.search.DocSlice)
 [javac]   return other.intersectionSize(this);
 [javac]   ^
 [javac] 
C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\ExtendedDis
 maxQParserPlugin.java:829: warning: [unchecked] unchecked conversion
 [javac] found   : java.util.List
 [javac] required:
java.util.Listorg.apache.lucene.search.BooleanClause
 [javac]   Query q = super.getBooleanQuery(clauses,
disableCoord);
 [javac]   ^
 [javac] 
C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\ExtendedDis
 maxQParserPlugin.java:845: warning: [unchecked] unchecked conversion
 [javac] found   : java.util.List
 [javac] required:
java.util.Listorg.apache.lucene.search.BooleanClause
 [javac]   super.addClause(clauses, conj, mods, q);
 [javac]   ^
 [javac] 
C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\FastLRUCach
 e.java:107: warning: [unchecked] unchecked cast
 [javac] found   : java.lang.Object
 [javac] required:
java.util.Listorg.apache.solr.common.util.ConcurrentLRUCa
 che.Stats
 [javac] statsList = (ListConcurrentLRUCache.Stats)
persistence;
 [javac]  ^
 [javac] 
C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\FastLRUCach
 e.java:263: warning: [unchecked] unchecked 

[jira] Commented: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007345#comment-13007345
 ] 

Bill Bell commented on SOLR-2242:
-

OK I did the required work, can we get more feedback or get it committed? What 
else is needed?

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org