[jira] [Updated] (SOLR-3973) Cross facet

2012-10-22 Thread ZhengBowen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhengBowen updated SOLR-3973:
-

Attachment: crossfacet.patch

the patch of cross facet.

 Cross facet
 ---

 Key: SOLR-3973
 URL: https://issues.apache.org/jira/browse/SOLR-3973
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 3.5
Reporter: ZhengBowen
  Labels: cross, facet, solr
 Fix For: 3.5

 Attachments: crossfacet.patch


 We often come across the scene of the multi-faceted cross, For example, the 
 SQL statement, select count( * ) from table1 group by A,B. 
 Now we slightly modified for FacetComponent, this component to be able to 
 support the multi-faceted cross.
 Request parameters are as follows:
 start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=,
 The effect of the new features are as follows:
 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime84/int
 lst name=params
 str name=facet.crosstrue/str
 str name=facettrue/str
 str name=shards
 10.253.93.71:62511/solr,10.253.93.71:62512/solr,10.253.93.71:62513/solr,10.253.93.71:62514/solr,
 /str
 str name=facet.cross.sep,/str
 str name=start0/str
 str name=q*:*/str
 str name=facet.limit10/str
 arr name=facet.field
 struser_city/str
 struser_province/str
 /arr
 str name=rows0/str
 /lst
 /lst
 result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
 min=Infinity/
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
 lst name=user_city,user_province
 int name=Beijing,Beijing16852/int
 int name=Shanghai,Shanghai16787/int
 int name=Guangzhou,Gunagdong12950/int
 int name=Shenzheng,Guangdong11667/int
 int name=Hangzhou,Zhejiang9997/int
 int name=Chongqing,Chongqing7624/int
 int name=Chengdu,Sichuan7082/int
 int name=Wuhan,Hubei6894/int
 int name=Suzhou,Jiangsu6528/int
 /lst
 /lst
 lst name=facet_numTerms/
 lst name=facet_dates/
 lst name=facet_ranges/
 /lst
 /response

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3963) SOLR: map() does not allow passing recip() sub-functions

2012-10-22 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481191#comment-13481191
 ] 

Bill Bell commented on SOLR-3963:
-

Yep. Enhancement request.

Bill Bell
Sent from mobile





 SOLR: map() does not allow passing recip() sub-functions
 

 Key: SOLR-3963
 URL: https://issues.apache.org/jira/browse/SOLR-3963
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Bill Bell

 I want to do:
 boost=map(achievement_count,1,1000,recip(achievement_count,-.5,10,25),1)
 I want to return recip(achievement_count,-.5,10,25) if achievement_count is 
 between 1 and 1,000. FOr any other values I want to return 1.
 I cannot get it to work. I get the error below. Interesting this does work:
 boost=recip(map(achievement_count,0,0,-200),-.5,10,25)
 It almost appears that map() cannot take a function.
  Specified argument was out of the range of valid values.
 Parameter name: value
 Description: An unhandled exception occurred during the execution of the 
 current web request. Please review the stack trace for more information about 
 the error and where it originated in the code.
 Exception Details: System.ArgumentOutOfRangeException: Specified argument was 
 out of the range of valid values.
 Parameter name: value
 Source Error:
 An unhandled exception was generated during the execution of the current web 
 request. Information regarding the origin and location of the exception can 
 be identified using the exception stack trace below.
 Stack Trace:
 [ArgumentOutOfRangeException: Specified argument was out of the range of 
 valid values.
 Parameter name: value]
System.Web.HttpResponse.set_StatusDescription(String value) +5200522
FacilityService.Controllers.FacilityController.ActionCompleted(String 
 actionName, IFacilityResults results) +265

 FacilityService.Controllers.FacilityController.SearchByPointCompleted(IFacilityResults
  results) +25
lambda_method(Closure , ControllerBase , Object[] ) +114
System.Web.Mvc.Async.c__DisplayClass7.BeginExecuteb__5(IAsyncResult 
 asyncResult) +283

 System.Web.Mvc.Async.c__DisplayClass41.BeginInvokeAsynchronousActionMethodb__40(IAsyncResult
  asyncResult) +22

 System.Web.Mvc.Async.c__DisplayClass3b.BeginInvokeActionMethodWithFiltersb__35()
  +120

 System.Web.Mvc.Async.c__DisplayClass51.InvokeActionMethodFilterAsynchronouslyb__4b()
  +452

 System.Web.Mvc.Async.c__DisplayClass39.BeginInvokeActionMethodWithFiltersb__38(IAsyncResult
  asyncResult) +15
System.Web.Mvc.Async.c__DisplayClass2c.BeginInvokeActionb__22() +33

 System.Web.Mvc.Async.c__DisplayClass27.BeginInvokeActionb__24(IAsyncResult
  asyncResult) +240
System.Web.Mvc.c__DisplayClass19.BeginExecuteCoreb__14(IAsyncResult 
 asyncResult) +28

 System.Web.Mvc.Async.c__DisplayClass4.MakeVoidDelegateb__3(IAsyncResult 
 ar) +15
System.Web.Mvc.AsyncController.EndExecuteCore(IAsyncResult asyncResult) +63

 System.Web.Mvc.Async.c__DisplayClass4.MakeVoidDelegateb__3(IAsyncResult 
 ar) +15
System.Web.Mvc.c__DisplayClassb.BeginProcessRequestb__4(IAsyncResult 
 asyncResult) +42

 System.Web.Mvc.Async.c__DisplayClass4.MakeVoidDelegateb__3(IAsyncResult 
 ar) +15
System.Web.CallHandlerExecutionStep.OnAsyncHandlerCompletion(IAsyncResult 
 ar) +282

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3973) Cross facet

2012-10-22 Thread ZhengBowen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhengBowen updated SOLR-3973:
-

Description: 
We often come across the scene of the multi-faceted cross, For example, the SQL 
statement, select count( * ) from table1 group by A,B. 

Now we slightly modified for FacetComponent, this component to be able to 
support the multi-faceted cross.

Request parameters are as follows:
start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=,

The original effect is as follows:
result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
min=Infinity/
lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
lst name=user_city
int name=Beijing16852/int
int name=ShangHai16787/int
int name=Gunagzhou12950/int
int name=Shenzhen11667/int
int name=Hangzhou9997/int
int name=Chongqing7624/int
int name=Chengdu7082/int
int name=Wuhan6894/int
int name=Suzhou6528/int
int name=Tianjin5822/int
/lst
lst name=user_province
int name=Gunagdong48621/int
int name=Zhengjiang34634/int
int name=Jiangsu28748/int
int name=Shandong20389/int
int name=Fujian18508/int
int name=Beijing16852/int
int name=Shanghai16787/int
int name=Hubei15227/int
int name=Sichuan15112/int
int name=Hebei13793/int
/lst
/lst
lst name=facet_numTerms/
lst name=facet_dates/
lst name=facet_ranges/
/lst

The effect of the new features are as follows:

result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
min=Infinity/
lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
  lst name=user_city,user_province
  int name=Beijing,Beijing16852/int
  int name=Shanghai,Shanghai16787/int
  int name=Guangzhou,Gunagdong12950/int
  int name=Shenzheng,Guangdong11667/int
  int name=Hangzhou,Zhejiang9997/int
  int name=Chongqing,Chongqing7624/int
  int name=Chengdu,Sichuan7082/int
  int name=Wuhan,Hubei6894/int
  int name=Suzhou,Jiangsu6528/int
/lst
/lst
lst name=facet_numTerms/
lst name=facet_dates/
lst name=facet_ranges/
/lst
/response

  was:
We often come across the scene of the multi-faceted cross, For example, the SQL 
statement, select count( * ) from table1 group by A,B. 

Now we slightly modified for FacetComponent, this component to be able to 
support the multi-faceted cross.

Request parameters are as follows:
start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=,

The effect of the new features are as follows:

response
lst name=responseHeader
int name=status0/int
int name=QTime84/int
lst name=params
str name=facet.crosstrue/str
str name=facettrue/str
str name=shards
10.253.93.71:62511/solr,10.253.93.71:62512/solr,10.253.93.71:62513/solr,10.253.93.71:62514/solr,
/str
str name=facet.cross.sep,/str
str name=start0/str
str name=q*:*/str
str name=facet.limit10/str
arr name=facet.field
struser_city/str
struser_province/str
/arr
str name=rows0/str
/lst
/lst
result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
min=Infinity/
lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
lst name=user_city,user_province
int name=Beijing,Beijing16852/int
int name=Shanghai,Shanghai16787/int
int name=Guangzhou,Gunagdong12950/int
int name=Shenzheng,Guangdong11667/int
int name=Hangzhou,Zhejiang9997/int
int name=Chongqing,Chongqing7624/int
int name=Chengdu,Sichuan7082/int
int name=Wuhan,Hubei6894/int
int name=Suzhou,Jiangsu6528/int
/lst
/lst
lst name=facet_numTerms/
lst name=facet_dates/
lst name=facet_ranges/
/lst
/response


 Cross facet
 ---

 Key: SOLR-3973
 URL: https://issues.apache.org/jira/browse/SOLR-3973
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 3.5
Reporter: ZhengBowen
  Labels: cross, facet, solr
 Fix For: 3.5

 Attachments: crossfacet.patch


 We often come across the scene of the multi-faceted cross, For example, the 
 SQL statement, select count( * ) from table1 group by A,B. 
 Now we slightly modified for FacetComponent, this component to be able to 
 support the multi-faceted cross.
 Request parameters are as follows:
 start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=,
 The original effect is as follows:
 result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
 min=Infinity/
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
 lst name=user_city
 int name=Beijing16852/int
 int name=ShangHai16787/int
 int name=Gunagzhou12950/int
 int name=Shenzhen11667/int
 int name=Hangzhou9997/int
 int name=Chongqing7624/int
 int name=Chengdu7082/int
 int name=Wuhan6894/int
 int name=Suzhou6528/int
 int name=Tianjin5822/int
 /lst
 lst name=user_province
 int name=Gunagdong48621/int
 int name=Zhengjiang34634/int
 int 

[jira] [Updated] (SOLR-3973) Cross facet

2012-10-22 Thread ZhengBowen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhengBowen updated SOLR-3973:
-

Description: 
We often come across the scene of the multi-faceted cross, For example, the SQL 
statement, select count( * ) from table group by A,B. 

Now we slightly modified for FacetComponent, this component to be able to 
support the multi-faceted cross.

Request parameters are as follows:
start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=,

The original effect is as follows:
result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
min=Infinity/
lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
lst name=user_city
int name=Beijing16852/int
int name=ShangHai16787/int
int name=Gunagzhou12950/int
int name=Shenzhen11667/int
int name=Hangzhou9997/int
int name=Chongqing7624/int
int name=Chengdu7082/int
int name=Wuhan6894/int
int name=Suzhou6528/int
int name=Tianjin5822/int
/lst
lst name=user_province
int name=Gunagdong48621/int
int name=Zhengjiang34634/int
int name=Jiangsu28748/int
int name=Shandong20389/int
int name=Fujian18508/int
int name=Beijing16852/int
int name=Shanghai16787/int
int name=Hubei15227/int
int name=Sichuan15112/int
int name=Hebei13793/int
/lst
/lst
lst name=facet_numTerms/
lst name=facet_dates/
lst name=facet_ranges/
/lst

The effect of the new features are as follows:

result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
min=Infinity/
lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
  lst name=user_city,user_province
  int name=Beijing,Beijing16852/int
  int name=Shanghai,Shanghai16787/int
  int name=Guangzhou,Gunagdong12950/int
  int name=Shenzheng,Guangdong11667/int
  int name=Hangzhou,Zhejiang9997/int
  int name=Chongqing,Chongqing7624/int
  int name=Chengdu,Sichuan7082/int
  int name=Wuhan,Hubei6894/int
  int name=Suzhou,Jiangsu6528/int
/lst
/lst
lst name=facet_numTerms/
lst name=facet_dates/
lst name=facet_ranges/
/lst
/response

  was:
We often come across the scene of the multi-faceted cross, For example, the SQL 
statement, select count( * ) from table1 group by A,B. 

Now we slightly modified for FacetComponent, this component to be able to 
support the multi-faceted cross.

Request parameters are as follows:
start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=,

The original effect is as follows:
result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
min=Infinity/
lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
lst name=user_city
int name=Beijing16852/int
int name=ShangHai16787/int
int name=Gunagzhou12950/int
int name=Shenzhen11667/int
int name=Hangzhou9997/int
int name=Chongqing7624/int
int name=Chengdu7082/int
int name=Wuhan6894/int
int name=Suzhou6528/int
int name=Tianjin5822/int
/lst
lst name=user_province
int name=Gunagdong48621/int
int name=Zhengjiang34634/int
int name=Jiangsu28748/int
int name=Shandong20389/int
int name=Fujian18508/int
int name=Beijing16852/int
int name=Shanghai16787/int
int name=Hubei15227/int
int name=Sichuan15112/int
int name=Hebei13793/int
/lst
/lst
lst name=facet_numTerms/
lst name=facet_dates/
lst name=facet_ranges/
/lst

The effect of the new features are as follows:

result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
min=Infinity/
lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
  lst name=user_city,user_province
  int name=Beijing,Beijing16852/int
  int name=Shanghai,Shanghai16787/int
  int name=Guangzhou,Gunagdong12950/int
  int name=Shenzheng,Guangdong11667/int
  int name=Hangzhou,Zhejiang9997/int
  int name=Chongqing,Chongqing7624/int
  int name=Chengdu,Sichuan7082/int
  int name=Wuhan,Hubei6894/int
  int name=Suzhou,Jiangsu6528/int
/lst
/lst
lst name=facet_numTerms/
lst name=facet_dates/
lst name=facet_ranges/
/lst
/response


 Cross facet
 ---

 Key: SOLR-3973
 URL: https://issues.apache.org/jira/browse/SOLR-3973
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 3.5
Reporter: ZhengBowen
  Labels: cross, facet, solr
 Fix For: 3.5

 Attachments: crossfacet.patch


 We often come across the scene of the multi-faceted cross, For example, the 
 SQL statement, select count( * ) from table group by A,B. 
 Now we slightly modified for FacetComponent, this component to be able to 
 support the multi-faceted cross.
 Request parameters are as follows:
 start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=,
 The original effect is as follows:
 result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
 min=Infinity/
 lst name=facet_counts
 lst 

[jira] [Updated] (SOLR-3973) Cross facet, facet on multiple columns.

2012-10-22 Thread ZhengBowen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhengBowen updated SOLR-3973:
-

Summary: Cross facet, facet on multiple columns.  (was: Cross facet)

 Cross facet, facet on multiple columns.
 ---

 Key: SOLR-3973
 URL: https://issues.apache.org/jira/browse/SOLR-3973
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 3.5
Reporter: ZhengBowen
  Labels: cross, facet, solr
 Fix For: 3.5

 Attachments: crossfacet.patch


 We often come across the scene of the multi-faceted cross, For example, the 
 SQL statement, select count( * ) from table group by A,B. 
 Now we slightly modified for FacetComponent, this component to be able to 
 support the multi-faceted cross.
 Request parameters are as follows:
 start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=,
 The original effect is as follows:
 result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
 min=Infinity/
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
 lst name=user_city
 int name=Beijing16852/int
 int name=ShangHai16787/int
 int name=Gunagzhou12950/int
 int name=Shenzhen11667/int
 int name=Hangzhou9997/int
 int name=Chongqing7624/int
 int name=Chengdu7082/int
 int name=Wuhan6894/int
 int name=Suzhou6528/int
 int name=Tianjin5822/int
 /lst
 lst name=user_province
 int name=Gunagdong48621/int
 int name=Zhengjiang34634/int
 int name=Jiangsu28748/int
 int name=Shandong20389/int
 int name=Fujian18508/int
 int name=Beijing16852/int
 int name=Shanghai16787/int
 int name=Hubei15227/int
 int name=Sichuan15112/int
 int name=Hebei13793/int
 /lst
 /lst
 lst name=facet_numTerms/
 lst name=facet_dates/
 lst name=facet_ranges/
 /lst
 The effect of the new features are as follows:
 result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
 min=Infinity/
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
   lst name=user_city,user_province
   int name=Beijing,Beijing16852/int
   int name=Shanghai,Shanghai16787/int
   int name=Guangzhou,Gunagdong12950/int
   int name=Shenzheng,Guangdong11667/int
   int name=Hangzhou,Zhejiang9997/int
   int name=Chongqing,Chongqing7624/int
   int name=Chengdu,Sichuan7082/int
   int name=Wuhan,Hubei6894/int
   int name=Suzhou,Jiangsu6528/int
 /lst
 /lst
 lst name=facet_numTerms/
 lst name=facet_dates/
 lst name=facet_ranges/
 /lst
 /response

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3973) Cross facet, faceting on multiple columns.

2012-10-22 Thread ZhengBowen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhengBowen updated SOLR-3973:
-

Summary: Cross facet, faceting on multiple columns.  (was: Cross facet, 
facet on multiple columns.)

 Cross facet, faceting on multiple columns.
 --

 Key: SOLR-3973
 URL: https://issues.apache.org/jira/browse/SOLR-3973
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 3.5
Reporter: ZhengBowen
  Labels: cross, facet, solr
 Fix For: 3.5


 We often come across the scene of the multi-faceted cross, For example, the 
 SQL statement, select count( * ) from table group by A,B. 
 Now we slightly modified for FacetComponent, this component to be able to 
 support the multi-faceted cross.
 Request parameters are as follows:
 start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=,
 The original effect is as follows:
 result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
 min=Infinity/
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
 lst name=user_city
 int name=Beijing16852/int
 int name=ShangHai16787/int
 int name=Gunagzhou12950/int
 int name=Shenzhen11667/int
 int name=Hangzhou9997/int
 int name=Chongqing7624/int
 int name=Chengdu7082/int
 int name=Wuhan6894/int
 int name=Suzhou6528/int
 int name=Tianjin5822/int
 /lst
 lst name=user_province
 int name=Gunagdong48621/int
 int name=Zhengjiang34634/int
 int name=Jiangsu28748/int
 int name=Shandong20389/int
 int name=Fujian18508/int
 int name=Beijing16852/int
 int name=Shanghai16787/int
 int name=Hubei15227/int
 int name=Sichuan15112/int
 int name=Hebei13793/int
 /lst
 /lst
 lst name=facet_numTerms/
 lst name=facet_dates/
 lst name=facet_ranges/
 /lst
 The effect of the new features are as follows:
 result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
 min=Infinity/
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
   lst name=user_city,user_province
   int name=Beijing,Beijing16852/int
   int name=Shanghai,Shanghai16787/int
   int name=Guangzhou,Gunagdong12950/int
   int name=Shenzheng,Guangdong11667/int
   int name=Hangzhou,Zhejiang9997/int
   int name=Chongqing,Chongqing7624/int
   int name=Chengdu,Sichuan7082/int
   int name=Wuhan,Hubei6894/int
   int name=Suzhou,Jiangsu6528/int
 /lst
 /lst
 lst name=facet_numTerms/
 lst name=facet_dates/
 lst name=facet_ranges/
 /lst
 /response

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3973) Cross facet, faceting on multiple columns.

2012-10-22 Thread ZhengBowen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhengBowen updated SOLR-3973:
-

Attachment: (was: crossfacet.patch)

 Cross facet, faceting on multiple columns.
 --

 Key: SOLR-3973
 URL: https://issues.apache.org/jira/browse/SOLR-3973
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 3.5
Reporter: ZhengBowen
  Labels: cross, facet, solr
 Fix For: 3.5


 We often come across the scene of the multi-faceted cross, For example, the 
 SQL statement, select count( * ) from table group by A,B. 
 Now we slightly modified for FacetComponent, this component to be able to 
 support the multi-faceted cross.
 Request parameters are as follows:
 start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=,
 The original effect is as follows:
 result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
 min=Infinity/
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
 lst name=user_city
 int name=Beijing16852/int
 int name=ShangHai16787/int
 int name=Gunagzhou12950/int
 int name=Shenzhen11667/int
 int name=Hangzhou9997/int
 int name=Chongqing7624/int
 int name=Chengdu7082/int
 int name=Wuhan6894/int
 int name=Suzhou6528/int
 int name=Tianjin5822/int
 /lst
 lst name=user_province
 int name=Gunagdong48621/int
 int name=Zhengjiang34634/int
 int name=Jiangsu28748/int
 int name=Shandong20389/int
 int name=Fujian18508/int
 int name=Beijing16852/int
 int name=Shanghai16787/int
 int name=Hubei15227/int
 int name=Sichuan15112/int
 int name=Hebei13793/int
 /lst
 /lst
 lst name=facet_numTerms/
 lst name=facet_dates/
 lst name=facet_ranges/
 /lst
 The effect of the new features are as follows:
 result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
 min=Infinity/
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
   lst name=user_city,user_province
   int name=Beijing,Beijing16852/int
   int name=Shanghai,Shanghai16787/int
   int name=Guangzhou,Gunagdong12950/int
   int name=Shenzheng,Guangdong11667/int
   int name=Hangzhou,Zhejiang9997/int
   int name=Chongqing,Chongqing7624/int
   int name=Chengdu,Sichuan7082/int
   int name=Wuhan,Hubei6894/int
   int name=Suzhou,Jiangsu6528/int
 /lst
 /lst
 lst name=facet_numTerms/
 lst name=facet_dates/
 lst name=facet_ranges/
 /lst
 /response

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3973) Cross facet, faceting on multiple columns.

2012-10-22 Thread ZhengBowen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhengBowen updated SOLR-3973:
-

Attachment: crossfacet.patch

the patche of cross facet.

 Cross facet, faceting on multiple columns.
 --

 Key: SOLR-3973
 URL: https://issues.apache.org/jira/browse/SOLR-3973
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 3.5
Reporter: ZhengBowen
  Labels: cross, facet, solr
 Fix For: 3.5

 Attachments: crossfacet.patch


 We often come across the scene of the multi-faceted cross, For example, the 
 SQL statement, select count( * ) from table group by A,B. 
 Now we slightly modified for FacetComponent, this component to be able to 
 support the multi-faceted cross.
 Request parameters are as follows:
 start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=,
 The original effect is as follows:
 result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
 min=Infinity/
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
 lst name=user_city
 int name=Beijing16852/int
 int name=ShangHai16787/int
 int name=Gunagzhou12950/int
 int name=Shenzhen11667/int
 int name=Hangzhou9997/int
 int name=Chongqing7624/int
 int name=Chengdu7082/int
 int name=Wuhan6894/int
 int name=Suzhou6528/int
 int name=Tianjin5822/int
 /lst
 lst name=user_province
 int name=Gunagdong48621/int
 int name=Zhengjiang34634/int
 int name=Jiangsu28748/int
 int name=Shandong20389/int
 int name=Fujian18508/int
 int name=Beijing16852/int
 int name=Shanghai16787/int
 int name=Hubei15227/int
 int name=Sichuan15112/int
 int name=Hebei13793/int
 /lst
 /lst
 lst name=facet_numTerms/
 lst name=facet_dates/
 lst name=facet_ranges/
 /lst
 The effect of the new features are as follows:
 result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
 min=Infinity/
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
   lst name=user_city,user_province
   int name=Beijing,Beijing16852/int
   int name=Shanghai,Shanghai16787/int
   int name=Guangzhou,Gunagdong12950/int
   int name=Shenzheng,Guangdong11667/int
   int name=Hangzhou,Zhejiang9997/int
   int name=Chongqing,Chongqing7624/int
   int name=Chengdu,Sichuan7082/int
   int name=Wuhan,Hubei6894/int
   int name=Suzhou,Jiangsu6528/int
 /lst
 /lst
 lst name=facet_numTerms/
 lst name=facet_dates/
 lst name=facet_ranges/
 /lst
 /response

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3973) Cross facet, faceting on multiple columns.

2012-10-22 Thread ZhengBowen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhengBowen updated SOLR-3973:
-

Description: 
We often come across the scene of the multi-faceted cross, For example, the SQL 
statement, select count( * ) from table group by A,B. 

Now we slightly modified for FacetComponent, this component to be able to 
support the multi-faceted cross.you can facet on 

Request parameters are as follows:
start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=,

The original effect is as follows:
result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
min=Infinity/
lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
lst name=user_city
int name=Beijing16852/int
int name=ShangHai16787/int
int name=Gunagzhou12950/int
int name=Shenzhen11667/int
int name=Hangzhou9997/int
int name=Chongqing7624/int
int name=Chengdu7082/int
int name=Wuhan6894/int
int name=Suzhou6528/int
int name=Tianjin5822/int
/lst
lst name=user_province
int name=Gunagdong48621/int
int name=Zhengjiang34634/int
int name=Jiangsu28748/int
int name=Shandong20389/int
int name=Fujian18508/int
int name=Beijing16852/int
int name=Shanghai16787/int
int name=Hubei15227/int
int name=Sichuan15112/int
int name=Hebei13793/int
/lst
/lst
lst name=facet_numTerms/
lst name=facet_dates/
lst name=facet_ranges/
/lst

The effect of the new features are as follows:

result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
min=Infinity/
lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
  lst name=user_city,user_province
  int name=Beijing,Beijing16852/int
  int name=Shanghai,Shanghai16787/int
  int name=Guangzhou,Gunagdong12950/int
  int name=Shenzheng,Guangdong11667/int
  int name=Hangzhou,Zhejiang9997/int
  int name=Chongqing,Chongqing7624/int
  int name=Chengdu,Sichuan7082/int
  int name=Wuhan,Hubei6894/int
  int name=Suzhou,Jiangsu6528/int
/lst
/lst
lst name=facet_numTerms/
lst name=facet_dates/
lst name=facet_ranges/
/lst
/response

  was:
We often come across the scene of the multi-faceted cross, For example, the SQL 
statement, select count( * ) from table group by A,B. 

Now we slightly modified for FacetComponent, this component to be able to 
support the multi-faceted cross.

Request parameters are as follows:
start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=,

The original effect is as follows:
result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
min=Infinity/
lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
lst name=user_city
int name=Beijing16852/int
int name=ShangHai16787/int
int name=Gunagzhou12950/int
int name=Shenzhen11667/int
int name=Hangzhou9997/int
int name=Chongqing7624/int
int name=Chengdu7082/int
int name=Wuhan6894/int
int name=Suzhou6528/int
int name=Tianjin5822/int
/lst
lst name=user_province
int name=Gunagdong48621/int
int name=Zhengjiang34634/int
int name=Jiangsu28748/int
int name=Shandong20389/int
int name=Fujian18508/int
int name=Beijing16852/int
int name=Shanghai16787/int
int name=Hubei15227/int
int name=Sichuan15112/int
int name=Hebei13793/int
/lst
/lst
lst name=facet_numTerms/
lst name=facet_dates/
lst name=facet_ranges/
/lst

The effect of the new features are as follows:

result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
min=Infinity/
lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
  lst name=user_city,user_province
  int name=Beijing,Beijing16852/int
  int name=Shanghai,Shanghai16787/int
  int name=Guangzhou,Gunagdong12950/int
  int name=Shenzheng,Guangdong11667/int
  int name=Hangzhou,Zhejiang9997/int
  int name=Chongqing,Chongqing7624/int
  int name=Chengdu,Sichuan7082/int
  int name=Wuhan,Hubei6894/int
  int name=Suzhou,Jiangsu6528/int
/lst
/lst
lst name=facet_numTerms/
lst name=facet_dates/
lst name=facet_ranges/
/lst
/response


 Cross facet, faceting on multiple columns.
 --

 Key: SOLR-3973
 URL: https://issues.apache.org/jira/browse/SOLR-3973
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 3.5
Reporter: ZhengBowen
  Labels: cross, facet, solr
 Fix For: 3.5

 Attachments: crossfacet.patch


 We often come across the scene of the multi-faceted cross, For example, the 
 SQL statement, select count( * ) from table group by A,B. 
 Now we slightly modified for FacetComponent, this component to be able to 
 support the multi-faceted cross.you can facet on 
 Request parameters are as follows:
 start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=,
 The original effect is as follows:
 result name=response 

[jira] [Updated] (SOLR-3973) Cross facet, faceting on multiple columns.

2012-10-22 Thread ZhengBowen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhengBowen updated SOLR-3973:
-

Description: 
We often come across the scene of the multi-faceted cross, For example, the SQL 
statement, select count( * ) from table group by A,B. 

Now we slightly modified for FacetComponent, this component to be able to 
support the multi-faceted cross.you can facet on multiple columns, and get the 
count result of multi-faceted cross.

Request parameters are as follows:
start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=,

The original effect is as follows:
result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
min=Infinity/
lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
lst name=user_city
int name=Beijing16852/int
int name=ShangHai16787/int
int name=Gunagzhou12950/int
int name=Shenzhen11667/int
int name=Hangzhou9997/int
int name=Chongqing7624/int
int name=Chengdu7082/int
int name=Wuhan6894/int
int name=Suzhou6528/int
int name=Tianjin5822/int
/lst
lst name=user_province
int name=Gunagdong48621/int
int name=Zhengjiang34634/int
int name=Jiangsu28748/int
int name=Shandong20389/int
int name=Fujian18508/int
int name=Beijing16852/int
int name=Shanghai16787/int
int name=Hubei15227/int
int name=Sichuan15112/int
int name=Hebei13793/int
/lst
/lst
lst name=facet_numTerms/
lst name=facet_dates/
lst name=facet_ranges/
/lst

The effect of the new features are as follows:

result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
min=Infinity/
lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
  lst name=user_city,user_province
  int name=Beijing,Beijing16852/int
  int name=Shanghai,Shanghai16787/int
  int name=Guangzhou,Gunagdong12950/int
  int name=Shenzheng,Guangdong11667/int
  int name=Hangzhou,Zhejiang9997/int
  int name=Chongqing,Chongqing7624/int
  int name=Chengdu,Sichuan7082/int
  int name=Wuhan,Hubei6894/int
  int name=Suzhou,Jiangsu6528/int
/lst
/lst
lst name=facet_numTerms/
lst name=facet_dates/
lst name=facet_ranges/
/lst
/response

  was:
We often come across the scene of the multi-faceted cross, For example, the SQL 
statement, select count( * ) from table group by A,B. 

Now we slightly modified for FacetComponent, this component to be able to 
support the multi-faceted cross.you can facet on 

Request parameters are as follows:
start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=,

The original effect is as follows:
result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
min=Infinity/
lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
lst name=user_city
int name=Beijing16852/int
int name=ShangHai16787/int
int name=Gunagzhou12950/int
int name=Shenzhen11667/int
int name=Hangzhou9997/int
int name=Chongqing7624/int
int name=Chengdu7082/int
int name=Wuhan6894/int
int name=Suzhou6528/int
int name=Tianjin5822/int
/lst
lst name=user_province
int name=Gunagdong48621/int
int name=Zhengjiang34634/int
int name=Jiangsu28748/int
int name=Shandong20389/int
int name=Fujian18508/int
int name=Beijing16852/int
int name=Shanghai16787/int
int name=Hubei15227/int
int name=Sichuan15112/int
int name=Hebei13793/int
/lst
/lst
lst name=facet_numTerms/
lst name=facet_dates/
lst name=facet_ranges/
/lst

The effect of the new features are as follows:

result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
min=Infinity/
lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
  lst name=user_city,user_province
  int name=Beijing,Beijing16852/int
  int name=Shanghai,Shanghai16787/int
  int name=Guangzhou,Gunagdong12950/int
  int name=Shenzheng,Guangdong11667/int
  int name=Hangzhou,Zhejiang9997/int
  int name=Chongqing,Chongqing7624/int
  int name=Chengdu,Sichuan7082/int
  int name=Wuhan,Hubei6894/int
  int name=Suzhou,Jiangsu6528/int
/lst
/lst
lst name=facet_numTerms/
lst name=facet_dates/
lst name=facet_ranges/
/lst
/response


 Cross facet, faceting on multiple columns.
 --

 Key: SOLR-3973
 URL: https://issues.apache.org/jira/browse/SOLR-3973
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 3.5
Reporter: ZhengBowen
  Labels: cross, facet, solr
 Fix For: 3.5

 Attachments: crossfacet.patch


 We often come across the scene of the multi-faceted cross, For example, the 
 SQL statement, select count( * ) from table group by A,B. 
 Now we slightly modified for FacetComponent, this component to be able to 
 support the multi-faceted cross.you can facet on multiple columns, and get 
 the count result of multi-faceted cross.
 Request parameters are as follows:
 

[jira] [Updated] (SOLR-3973) Cross facet, faceting on multiple columns.

2012-10-22 Thread ZhengBowen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhengBowen updated SOLR-3973:
-

Comment: was deleted

(was: the patche of cross facet.)

 Cross facet, faceting on multiple columns.
 --

 Key: SOLR-3973
 URL: https://issues.apache.org/jira/browse/SOLR-3973
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 3.5
Reporter: ZhengBowen
  Labels: cross, facet, solr
 Fix For: 3.5

 Attachments: crossfacet.patch


 We often come across the scene of the multi-faceted cross, For example, the 
 SQL statement, select count( * ) from table group by A,B. 
 Now we slightly modified for FacetComponent, this component to be able to 
 support the multi-faceted cross.you can facet on multiple columns, and get 
 the count result of multi-faceted cross.
 Request parameters are as follows:
 start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=,
 The original effect is as follows:
 result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
 min=Infinity/
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
 lst name=user_city
 int name=Beijing16852/int
 int name=ShangHai16787/int
 int name=Gunagzhou12950/int
 int name=Shenzhen11667/int
 int name=Hangzhou9997/int
 int name=Chongqing7624/int
 int name=Chengdu7082/int
 int name=Wuhan6894/int
 int name=Suzhou6528/int
 int name=Tianjin5822/int
 /lst
 lst name=user_province
 int name=Gunagdong48621/int
 int name=Zhengjiang34634/int
 int name=Jiangsu28748/int
 int name=Shandong20389/int
 int name=Fujian18508/int
 int name=Beijing16852/int
 int name=Shanghai16787/int
 int name=Hubei15227/int
 int name=Sichuan15112/int
 int name=Hebei13793/int
 /lst
 /lst
 lst name=facet_numTerms/
 lst name=facet_dates/
 lst name=facet_ranges/
 /lst
 The effect of the new features are as follows:
 result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
 min=Infinity/
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
   lst name=user_city,user_province
   int name=Beijing,Beijing16852/int
   int name=Shanghai,Shanghai16787/int
   int name=Guangzhou,Gunagdong12950/int
   int name=Shenzheng,Guangdong11667/int
   int name=Hangzhou,Zhejiang9997/int
   int name=Chongqing,Chongqing7624/int
   int name=Chengdu,Sichuan7082/int
   int name=Wuhan,Hubei6894/int
   int name=Suzhou,Jiangsu6528/int
 /lst
 /lst
 lst name=facet_numTerms/
 lst name=facet_dates/
 lst name=facet_ranges/
 /lst
 /response

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3973) Cross facet, faceting on multiple columns.

2012-10-22 Thread ZhengBowen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481198#comment-13481198
 ] 

ZhengBowen commented on SOLR-3973:
--

We often come across the scene of the multi-faceted cross, For example, the SQL 
statement, select count( * ) from table group by A,B.

so, this patch is to surport faceting on multiple columns, and you can get the 
counts result of multi-faceted cross.

i come from alipay in china, we use Solr to build multidimensional analysis 
platform for mass data.

 Cross facet, faceting on multiple columns.
 --

 Key: SOLR-3973
 URL: https://issues.apache.org/jira/browse/SOLR-3973
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 3.5
Reporter: ZhengBowen
  Labels: cross, facet, solr
 Fix For: 3.5

 Attachments: crossfacet.patch


 We often come across the scene of the multi-faceted cross, For example, the 
 SQL statement, select count( * ) from table group by A,B. 
 Now we slightly modified for FacetComponent, this component to be able to 
 support the multi-faceted cross.you can facet on multiple columns, and get 
 the count result of multi-faceted cross.
 Request parameters are as follows:
 start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=,
 The original effect is as follows:
 result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
 min=Infinity/
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
 lst name=user_city
 int name=Beijing16852/int
 int name=ShangHai16787/int
 int name=Gunagzhou12950/int
 int name=Shenzhen11667/int
 int name=Hangzhou9997/int
 int name=Chongqing7624/int
 int name=Chengdu7082/int
 int name=Wuhan6894/int
 int name=Suzhou6528/int
 int name=Tianjin5822/int
 /lst
 lst name=user_province
 int name=Gunagdong48621/int
 int name=Zhengjiang34634/int
 int name=Jiangsu28748/int
 int name=Shandong20389/int
 int name=Fujian18508/int
 int name=Beijing16852/int
 int name=Shanghai16787/int
 int name=Hubei15227/int
 int name=Sichuan15112/int
 int name=Hebei13793/int
 /lst
 /lst
 lst name=facet_numTerms/
 lst name=facet_dates/
 lst name=facet_ranges/
 /lst
 The effect of the new features are as follows:
 result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
 min=Infinity/
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
   lst name=user_city,user_province
   int name=Beijing,Beijing16852/int
   int name=Shanghai,Shanghai16787/int
   int name=Guangzhou,Gunagdong12950/int
   int name=Shenzheng,Guangdong11667/int
   int name=Hangzhou,Zhejiang9997/int
   int name=Chongqing,Chongqing7624/int
   int name=Chengdu,Sichuan7082/int
   int name=Wuhan,Hubei6894/int
   int name=Suzhou,Jiangsu6528/int
 /lst
 /lst
 lst name=facet_numTerms/
 lst name=facet_dates/
 lst name=facet_ranges/
 /lst
 /response

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3973) Cross facet, faceting on multiple columns.

2012-10-22 Thread ZhengBowen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhengBowen updated SOLR-3973:
-

Comment: was deleted

(was: the patch of cross facet.)

 Cross facet, faceting on multiple columns.
 --

 Key: SOLR-3973
 URL: https://issues.apache.org/jira/browse/SOLR-3973
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 3.5
Reporter: ZhengBowen
  Labels: cross, facet, solr
 Fix For: 3.5

 Attachments: crossfacet.patch


 We often come across the scene of the multi-faceted cross, For example, the 
 SQL statement, select count( * ) from table group by A,B. 
 Now we slightly modified for FacetComponent, this component to be able to 
 support the multi-faceted cross.you can facet on multiple columns, and get 
 the count result of multi-faceted cross.
 Request parameters are as follows:
 start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=,
 The original effect is as follows:
 result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
 min=Infinity/
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
 lst name=user_city
 int name=Beijing16852/int
 int name=ShangHai16787/int
 int name=Gunagzhou12950/int
 int name=Shenzhen11667/int
 int name=Hangzhou9997/int
 int name=Chongqing7624/int
 int name=Chengdu7082/int
 int name=Wuhan6894/int
 int name=Suzhou6528/int
 int name=Tianjin5822/int
 /lst
 lst name=user_province
 int name=Gunagdong48621/int
 int name=Zhengjiang34634/int
 int name=Jiangsu28748/int
 int name=Shandong20389/int
 int name=Fujian18508/int
 int name=Beijing16852/int
 int name=Shanghai16787/int
 int name=Hubei15227/int
 int name=Sichuan15112/int
 int name=Hebei13793/int
 /lst
 /lst
 lst name=facet_numTerms/
 lst name=facet_dates/
 lst name=facet_ranges/
 /lst
 The effect of the new features are as follows:
 result name=response numFound=479140 start=0 sum=0.0 max=-Infinity 
 min=Infinity/
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
   lst name=user_city,user_province
   int name=Beijing,Beijing16852/int
   int name=Shanghai,Shanghai16787/int
   int name=Guangzhou,Gunagdong12950/int
   int name=Shenzheng,Guangdong11667/int
   int name=Hangzhou,Zhejiang9997/int
   int name=Chongqing,Chongqing7624/int
   int name=Chengdu,Sichuan7082/int
   int name=Wuhan,Hubei6894/int
   int name=Suzhou,Jiangsu6528/int
 /lst
 /lst
 lst name=facet_numTerms/
 lst name=facet_dates/
 lst name=facet_ranges/
 /lst
 /response

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.8.0-ea-b58) - Build # 1917 - Failure!

2012-10-22 Thread Dawid Weiss
It's a JVM crash, you can see the hs dump above in the logs (a copy
below). It's a bug on my part that this doesn't complete with a more
informational exception message though -- I'll take a look and fix for
the next release.

Dawid

[junit4:junit4]  JVM J1: stdout (verbatim) 
[junit4:junit4] #
[junit4:junit4] # A fatal error has been detected by the Java Runtime
Environment:
[junit4:junit4] #
[junit4:junit4] #  SIGSEGV (0xb) at pc=0x7fd3893f9058, pid=13675,
tid=140546309019392
[junit4:junit4] #
[junit4:junit4] # JRE version: Java(TM) SE Runtime Environment (8.0-b58)
[junit4:junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b02
mixed mode linux-amd64 compressed oops)
[junit4:junit4] # Problematic frame:
[junit4:junit4] # V  [libjvm.so+0x7da058]
ParRootScanWithBarrierTwoGensClosure::do_oop(unsigned int*)+0x78
[junit4:junit4] #
[junit4:junit4] # Failed to write core dump. Core dumps have been
disabled. To enable core dumping, try ulimit -c unlimited before
starting Java again
[junit4:junit4] #
[junit4:junit4] # An error report file with more information is saved as:
[junit4:junit4] #
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build/analysis/common/test/J1/hs_err_pid13675.log
[junit4:junit4] #
[junit4:junit4] # If you would like to submit a bug report, please visit:
[junit4:junit4] #   http://bugreport.sun.com/bugreport/crash.jsp
[junit4:junit4] #
[junit4:junit4]  JVM J1: EOF 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4476) maven deployment scripts dont work (except from the machine you made the RC from)

2012-10-22 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481226#comment-13481226
 ] 

Uwe Schindler commented on LUCENE-4476:
---

bq. Does this also happen on windows if you sign artifacts with your GPG key?

Definitely not! The password is hidden! This is clearly a cygwin issue (and 
only if you use the cygwin console window). With the official Windows 7 cmd.exe 
in the official Windows console window the password is not shown. I never use 
cygwin for builfding on windows, why do you Steven? To run ANT and build 
artifacts a plain cmd.exe is fine.

 maven deployment scripts dont work (except from the machine you made the RC 
 from)
 -

 Key: LUCENE-4476
 URL: https://issues.apache.org/jira/browse/LUCENE-4476
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4476.patch, LUCENE-4476.patch, LUCENE-4476.patch


 Currently the maven process described in 
 http://wiki.apache.org/lucene-java/PublishMavenArtifacts does not work (on 
 mac)
 It worked fine for the 4.0-alpha and 4.0-beta releases.
 NOTE: This appears to be working on linux so I am going with that. But this 
 seems strange it doesnt work on mac.
  
 {noformat}
 artifact:install-provider] Installing provider: 
 org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-7:runtime
 [artifact:pom] Downloading: 
 org/apache/lucene/lucene-parent/4.0.0/lucene-parent-4.0.0.pom from repository 
 sonatype.releases at http://oss.sonatype.org/content/repositories/releases
 [artifact:pom] Unable to locate resource in repository
 [artifact:pom] [INFO] Unable to find resource 
 'org.apache.lucene:lucene-parent:pom:4.0.0' in repository sonatype.releases 
 (http://oss.sonatype.org/content/repositories/releases)
 [artifact:pom] Downloading: 
 org/apache/lucene/lucene-parent/4.0.0/lucene-parent-4.0.0.pom from repository 
 central at http://repo1.maven.org/maven2
 [artifact:pom] Unable to locate resource in repository
 [artifact:pom] [INFO] Unable to find resource 
 'org.apache.lucene:lucene-parent:pom:4.0.0' in repository central 
 (http://repo1.maven.org/maven2)
 [artifact:pom] An error has occurred while processing the Maven artifact 
 tasks.
 [artifact:pom]  Diagnosis:
 [artifact:pom] 
 [artifact:pom] Unable to initialize POM lucene-test-framework-4.0.0.pom: 
 Cannot find parent: org.apache.lucene:lucene-parent for project: 
 org.apache.lucene:lucene-test-framework:jar:null for project 
 org.apache.lucene:lucene-test-framework:jar:null
 [artifact:pom] Unable to download the artifact from any repository
 BUILD FAILED
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Lucene build ivy problems

2012-10-22 Thread Uwe Schindler
It only downloads on the first try, later builds never download anything unless 
dependencies have changed. And if you would be able to *not* download them, 
your build would not succeed.

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de http://www.thetaphi.de/ 

eMail: u...@thetaphi.de

 

From: Lance Norskog [mailto:goks...@gmail.com] 
Sent: Monday, October 22, 2012 5:03 AM
To: dev@lucene.apache.org
Subject: Lucene build  ivy problems

 

If I have all of the dependencies downloaded, how can I tell the build to skip 
checking the repositories?

I'm working on a somewhat dodgy internet connection. I ran 'ant example' a 
hundred times. On the 101st, I had an internet outage and the Ivy stuff 
blocked. Ever after that the resolver hangs. I had to remove the home/.ivy2 
directory and start over. And now all of the dependencies are slowly 
downloading again over a dodgy internet cafe connection.

Is there some flag to the ant build that says just pretend everything is 
downloaded?







[jira] [Commented] (LUCENE-4476) maven deployment scripts dont work (except from the machine you made the RC from)

2012-10-22 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481228#comment-13481228
 ] 

Uwe Schindler commented on LUCENE-4476:
---

Ah, also: if you run bash.exe in the official Windows console windows (not 
cygwin's own), it also works. It's a bug of the dumb cygwin-internal console 
window only (why do they have it?) - sorry, I have to rant about Cygwin; I use 
it, too, but only to execute find/sed/grep...

 maven deployment scripts dont work (except from the machine you made the RC 
 from)
 -

 Key: LUCENE-4476
 URL: https://issues.apache.org/jira/browse/LUCENE-4476
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4476.patch, LUCENE-4476.patch, LUCENE-4476.patch


 Currently the maven process described in 
 http://wiki.apache.org/lucene-java/PublishMavenArtifacts does not work (on 
 mac)
 It worked fine for the 4.0-alpha and 4.0-beta releases.
 NOTE: This appears to be working on linux so I am going with that. But this 
 seems strange it doesnt work on mac.
  
 {noformat}
 artifact:install-provider] Installing provider: 
 org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-7:runtime
 [artifact:pom] Downloading: 
 org/apache/lucene/lucene-parent/4.0.0/lucene-parent-4.0.0.pom from repository 
 sonatype.releases at http://oss.sonatype.org/content/repositories/releases
 [artifact:pom] Unable to locate resource in repository
 [artifact:pom] [INFO] Unable to find resource 
 'org.apache.lucene:lucene-parent:pom:4.0.0' in repository sonatype.releases 
 (http://oss.sonatype.org/content/repositories/releases)
 [artifact:pom] Downloading: 
 org/apache/lucene/lucene-parent/4.0.0/lucene-parent-4.0.0.pom from repository 
 central at http://repo1.maven.org/maven2
 [artifact:pom] Unable to locate resource in repository
 [artifact:pom] [INFO] Unable to find resource 
 'org.apache.lucene:lucene-parent:pom:4.0.0' in repository central 
 (http://repo1.maven.org/maven2)
 [artifact:pom] An error has occurred while processing the Maven artifact 
 tasks.
 [artifact:pom]  Diagnosis:
 [artifact:pom] 
 [artifact:pom] Unable to initialize POM lucene-test-framework-4.0.0.pom: 
 Cannot find parent: org.apache.lucene:lucene-parent for project: 
 org.apache.lucene:lucene-test-framework:jar:null for project 
 org.apache.lucene:lucene-test-framework:jar:null
 [artifact:pom] Unable to download the artifact from any repository
 BUILD FAILED
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.8.0-ea-b58) - Build # 1917 - Failure!

2012-10-22 Thread Dawid Weiss
Ok, everything is fine -- the default action on seeing forked process
output is to pipe it to the logs, warn but not throw an exception
(which the build followed). The problem was that people used various
-D options for debugging and profiling which resulted in legitimate
output to process output descriptors (bypassing System.* stream
redirectors).

I've changed the message a bit to indicate whether any output was
emitted on exit status != 0.

Dawid

On Mon, Oct 22, 2012 at 8:47 AM, Dawid Weiss
dawid.we...@cs.put.poznan.pl wrote:
 It's a JVM crash, you can see the hs dump above in the logs (a copy
 below). It's a bug on my part that this doesn't complete with a more
 informational exception message though -- I'll take a look and fix for
 the next release.

 Dawid

 [junit4:junit4]  JVM J1: stdout (verbatim) 
 [junit4:junit4] #
 [junit4:junit4] # A fatal error has been detected by the Java Runtime
 Environment:
 [junit4:junit4] #
 [junit4:junit4] #  SIGSEGV (0xb) at pc=0x7fd3893f9058, pid=13675,
 tid=140546309019392
 [junit4:junit4] #
 [junit4:junit4] # JRE version: Java(TM) SE Runtime Environment (8.0-b58)
 [junit4:junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b02
 mixed mode linux-amd64 compressed oops)
 [junit4:junit4] # Problematic frame:
 [junit4:junit4] # V  [libjvm.so+0x7da058]
 ParRootScanWithBarrierTwoGensClosure::do_oop(unsigned int*)+0x78
 [junit4:junit4] #
 [junit4:junit4] # Failed to write core dump. Core dumps have been
 disabled. To enable core dumping, try ulimit -c unlimited before
 starting Java again
 [junit4:junit4] #
 [junit4:junit4] # An error report file with more information is saved as:
 [junit4:junit4] #
 /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build/analysis/common/test/J1/hs_err_pid13675.log
 [junit4:junit4] #
 [junit4:junit4] # If you would like to submit a bug report, please visit:
 [junit4:junit4] #   http://bugreport.sun.com/bugreport/crash.jsp
 [junit4:junit4] #
 [junit4:junit4]  JVM J1: EOF 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3964) Solr does not return error, even though create collection unsuccessfully

2012-10-22 Thread milesli (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

milesli updated SOLR-3964:
--

Description: 
Solr does not return error,
 even though create/delete collection unsuccessfully; 
 even though the request URL is incorrect;
(example: 
http://127.0.0.1:8983/solr/admin/collections?action=CREATEname=tenancy_milesnumShards=3numReplicas=2collection.configName=myconf)

 even though pass the collection name  already exists;

  was:
Solr does not return error,
 even though create collection unsuccessfully; 
 even though the request URL is incorrect;
(example: 
http://127.0.0.1:8983/solr/admin/collections?action=CREATEname=tenancy_milesnumShards=3numReplicas=2collection.configName=myconf)

 even though pass the collection name  already exists;


 Solr does not return error, even though create collection unsuccessfully 
 -

 Key: SOLR-3964
 URL: https://issues.apache.org/jira/browse/SOLR-3964
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
Reporter: milesli
  Labels: lack, message, response
   Original Estimate: 6h
  Remaining Estimate: 6h

 Solr does not return error,
  even though create/delete collection unsuccessfully; 
  even though the request URL is incorrect;
 (example: 
 http://127.0.0.1:8983/solr/admin/collections?action=CREATEname=tenancy_milesnumShards=3numReplicas=2collection.configName=myconf)
  even though pass the collection name  already exists;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3964) Solr does not return error, even though create collection unsuccessfully

2012-10-22 Thread milesli (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

milesli updated SOLR-3964:
--

Priority: Major  (was: Minor)

 Solr does not return error, even though create collection unsuccessfully 
 -

 Key: SOLR-3964
 URL: https://issues.apache.org/jira/browse/SOLR-3964
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
Reporter: milesli
  Labels: lack, message, response
   Original Estimate: 6h
  Remaining Estimate: 6h

 Solr does not return error,
  even though create collection unsuccessfully; 
  even though the request URL is incorrect;
 (example: 
 http://127.0.0.1:8983/solr/admin/collections?action=CREATEname=tenancy_milesnumShards=3numReplicas=2collection.configName=myconf)
  even though pass the collection name  already exists;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4494) Add phoenetic algorithm Match Rating approach to lucene

2012-10-22 Thread Colm Rice (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481271#comment-13481271
 ] 

Colm Rice commented on LUCENE-4494:
---

Thanks Steve. Glad to be able to contribute. The first of many :-)
Thanks for the link, I'll swot up on it.

Hi Lance, yes that's the one. I wrote that article btw!

 Add phoenetic algorithm Match Rating approach to lucene
 ---

 Key: LUCENE-4494
 URL: https://issues.apache.org/jira/browse/LUCENE-4494
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0-ALPHA
Reporter: Colm Rice
Priority: Minor
 Fix For: 4.1

   Original Estimate: 168h
  Remaining Estimate: 168h

 I want to add MatchRatingApproach algorithm to the Lucene project. 
 What I have at the moment is a class called 
 org.apache.lucene.analysis.phoenetic.MatchRatingApproach implementing 
 StringEncoder
 I have a pretty comprehensive test file located at: 
 org.apache.lucene.analysis.phonetic.MatchRatingApproachTests
 It's not exactly existing pattern so I'm going to need a bit of advice here. 
 Thanks! Feel free to email.
 FYI: It my first contribitution so be gentle :-) C# is my native.
 Reference: http://en.wikipedia.org/wiki/Match_rating_approach

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3245) Poor performance of Hunspell with Polish Dictionary

2012-10-22 Thread Romain MERESSE (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481277#comment-13481277
 ] 

Romain MERESSE commented on SOLR-3245:
--

Same problem here, with French dictionary in Solr 3.6

With Hunspell : ~5 documents/s
Without Hunspell : ~280 documents/s

Someone got a solution ? ...

 Poor performance of Hunspell with Polish Dictionary
 ---

 Key: SOLR-3245
 URL: https://issues.apache.org/jira/browse/SOLR-3245
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 4.0-ALPHA
 Environment: Centos 6.2, kernel 2.6.32, 2 physical CPU Xeon 5606 (4 
 cores each), 32 GB RAM, 2 SSD disks in RAID 0, java version 1.6.0_26, java 
 settings -server -Xms4096M -Xmx4096M 
Reporter: Agnieszka
  Labels: performance
 Attachments: pl_PL.zip


 In Solr 4.0 Hunspell stemmer with polish dictionary has poor performance 
 whereas performance of hunspell from 
 http://code.google.com/p/lucene-hunspell/ in solr 3.4 is very good. 
 Tests shows:
 Solr 3.4, full import 489017 documents:
 StempelPolishStemFilterFactory -  2908 seconds, 168 docs/sec 
 HunspellStemFilterFactory - 3922 seconds, 125 docs/sec
 Solr 4.0, full import 489017 documents:
 StempelPolishStemFilterFactory - 3016 seconds, 162 docs/sec 
 HunspellStemFilterFactory - 44580 seconds (more than 12 hours), 11 docs/sec
 My schema is quit easy. For Hunspell I have one text field I copy 14 text 
 fields to:
 {code:xml}
 field name=text type=text_pl_hunspell indexed=true stored=false 
 multiValued=true/
 copyField source=field1 dest=text/  
 
 copyField source=field14 dest=text/
 {code}
 The text_pl_hunspell configuration:
 {code:xml}
 fieldType name=text_pl_hunspell class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=dict/stopwords_pl.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.HunspellStemFilterFactory 
 dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true
 !--filter class=solr.KeywordMarkerFilterFactory 
 protected=protwords_pl.txt/--
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory 
 synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=dict/stopwords_pl.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.HunspellStemFilterFactory 
 dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true
 filter class=solr.KeywordMarkerFilterFactory 
 protected=dict/protwords_pl.txt/
   /analyzer
 /fieldType
 {code}
 I use Polish dictionary (files stopwords_pl.txt, protwords_pl.txt, 
 synonyms_pl.txt are empy)- pl_PL.dic, pl_PL.aff. These are the same files I 
 used in 3.4 version. 
 For Polish Stemmer the diffrence is only in definion text field:
 {code}
 field name=text type=text_pl indexed=true stored=false 
 multiValued=true/
 fieldType name=text_pl class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=dict/stopwords_pl.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StempelPolishStemFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory 
 protected=dict/protwords_pl.txt/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory 
 synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=dict/stopwords_pl.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StempelPolishStemFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory 
 protected=dict/protwords_pl.txt/
   /analyzer
 /fieldType
 {code}
 One document has 23 fields:
 - 14 text fields copy to one text field (above) that is only indexed
 - 8 other indexed fields (2 strings, 2 tdates, 3 tint, 1 tfloat) The size of 
 one document is 3-4 kB.

--
This message is automatically generated by JIRA.
If you think it 

[jira] [Comment Edited] (SOLR-3245) Poor performance of Hunspell with Polish Dictionary

2012-10-22 Thread Romain MERESSE (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481277#comment-13481277
 ] 

Romain MERESSE edited comment on SOLR-3245 at 10/22/12 9:51 AM:


Same problem here, with French dictionary in Solr 3.6

With Hunspell : ~5 documents/s
Without Hunspell : ~280 documents/s

Someone got a solution ? ...
Quite sad as this is a very important feature (stemming is poor with Snowball)

  was (Author: rohk):
Same problem here, with French dictionary in Solr 3.6

With Hunspell : ~5 documents/s
Without Hunspell : ~280 documents/s

Someone got a solution ? ...
  
 Poor performance of Hunspell with Polish Dictionary
 ---

 Key: SOLR-3245
 URL: https://issues.apache.org/jira/browse/SOLR-3245
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 4.0-ALPHA
 Environment: Centos 6.2, kernel 2.6.32, 2 physical CPU Xeon 5606 (4 
 cores each), 32 GB RAM, 2 SSD disks in RAID 0, java version 1.6.0_26, java 
 settings -server -Xms4096M -Xmx4096M 
Reporter: Agnieszka
  Labels: performance
 Attachments: pl_PL.zip


 In Solr 4.0 Hunspell stemmer with polish dictionary has poor performance 
 whereas performance of hunspell from 
 http://code.google.com/p/lucene-hunspell/ in solr 3.4 is very good. 
 Tests shows:
 Solr 3.4, full import 489017 documents:
 StempelPolishStemFilterFactory -  2908 seconds, 168 docs/sec 
 HunspellStemFilterFactory - 3922 seconds, 125 docs/sec
 Solr 4.0, full import 489017 documents:
 StempelPolishStemFilterFactory - 3016 seconds, 162 docs/sec 
 HunspellStemFilterFactory - 44580 seconds (more than 12 hours), 11 docs/sec
 My schema is quit easy. For Hunspell I have one text field I copy 14 text 
 fields to:
 {code:xml}
 field name=text type=text_pl_hunspell indexed=true stored=false 
 multiValued=true/
 copyField source=field1 dest=text/  
 
 copyField source=field14 dest=text/
 {code}
 The text_pl_hunspell configuration:
 {code:xml}
 fieldType name=text_pl_hunspell class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=dict/stopwords_pl.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.HunspellStemFilterFactory 
 dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true
 !--filter class=solr.KeywordMarkerFilterFactory 
 protected=protwords_pl.txt/--
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory 
 synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=dict/stopwords_pl.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.HunspellStemFilterFactory 
 dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true
 filter class=solr.KeywordMarkerFilterFactory 
 protected=dict/protwords_pl.txt/
   /analyzer
 /fieldType
 {code}
 I use Polish dictionary (files stopwords_pl.txt, protwords_pl.txt, 
 synonyms_pl.txt are empy)- pl_PL.dic, pl_PL.aff. These are the same files I 
 used in 3.4 version. 
 For Polish Stemmer the diffrence is only in definion text field:
 {code}
 field name=text type=text_pl indexed=true stored=false 
 multiValued=true/
 fieldType name=text_pl class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=dict/stopwords_pl.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StempelPolishStemFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory 
 protected=dict/protwords_pl.txt/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory 
 synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=dict/stopwords_pl.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StempelPolishStemFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory 
 

[jira] [Created] (SOLR-3974) Disabling External entity resolution when using XSL in DIH

2012-10-22 Thread Stephane Gamard (JIRA)
Stephane Gamard created SOLR-3974:
-

 Summary: Disabling External entity resolution when using XSL in DIH
 Key: SOLR-3974
 URL: https://issues.apache.org/jira/browse/SOLR-3974
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 4.0, 4.1
Reporter: Stephane Gamard


When using XSL transformation in DIH Solr tries to resolve DTD and fails when 
missing. This is similar to SOLR-3895 (which is solely intended to the 
RequestHandler). 

Sample data-config.xml:
{code:xml}
entity name=sample 
processor=FileListEntityProcessor 
baseDir=/Volumes/data/datasets/sample 
fileName=^.*\.xml$ 
recursive=true 
rootEntity=false
dataSource=null

entity name=article 
stream=false
xsl=xslt/toDocument.xslt 
processor=XPathEntityProcessor 
url=${sample.fileAbsolutePath} 
useSolrAddSchema=true
/entity
/entity
{code}

Import will fail with the following error: 
{code}
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
Exception in applying XSL Transformeation Processing Document # 1
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:304)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:224)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:204)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:472)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:498)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:411)
... 5 more
Caused by: javax.xml.transform.TransformerException: 
javax.xml.transform.TransformerException: 
com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: 
/opt/solr/archivearticle3.dtd (No such file or directory)
at 
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:735)
at 
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:336)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:299)
... 11 more
Caused by: javax.xml.transform.TransformerException: 
com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: 
/opt/solr/archivearticle3.dtd (No such file or directory)
at 
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getDOM(TransformerImpl.java:564)
at 
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:725)
... 13 more
Caused by: com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: 
/opt/solr/archivearticle3.dtd (No such file or directory)
at 
com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(XSLTCDTMManager.java:460)
at 
com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(XSLTCDTMManager.java:248)
at 
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getDOM(TransformerImpl.java:542)
... 14 more
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3974) Disabling External entity resolution when using XSL in DIH

2012-10-22 Thread Stephane Gamard (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephane Gamard updated SOLR-3974:
--

Component/s: update

 Disabling External entity resolution when using XSL in DIH
 --

 Key: SOLR-3974
 URL: https://issues.apache.org/jira/browse/SOLR-3974
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler, update
Affects Versions: 4.0, 4.1
Reporter: Stephane Gamard

 When using XSL transformation in DIH Solr tries to resolve DTD and fails when 
 missing. This is similar to SOLR-3895 (which is solely intended to the 
 RequestHandler). 
 Sample data-config.xml:
 {code:xml}
 entity name=sample 
   processor=FileListEntityProcessor 
   baseDir=/Volumes/data/datasets/sample 
   fileName=^.*\.xml$ 
   recursive=true 
   rootEntity=false
   dataSource=null
   
 entity name=article 
   stream=false
   xsl=xslt/toDocument.xslt 
   processor=XPathEntityProcessor 
   url=${sample.fileAbsolutePath} 
   useSolrAddSchema=true
   /entity
 /entity
 {code}
 Import will fail with the following error: 
 {code}
 Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
 Exception in applying XSL Transformeation Processing Document # 1
   at 
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
   at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:304)
   at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:224)
   at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:204)
   at 
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
   at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:472)
   at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:498)
   at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:411)
   ... 5 more
 Caused by: javax.xml.transform.TransformerException: 
 javax.xml.transform.TransformerException: 
 com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: 
 /opt/solr/archivearticle3.dtd (No such file or directory)
   at 
 com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:735)
   at 
 com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:336)
   at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:299)
   ... 11 more
 Caused by: javax.xml.transform.TransformerException: 
 com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: 
 /opt/solr/archivearticle3.dtd (No such file or directory)
   at 
 com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getDOM(TransformerImpl.java:564)
   at 
 com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:725)
   ... 13 more
 Caused by: com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: 
 /opt/solr/archivearticle3.dtd (No such file or directory)
   at 
 com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(XSLTCDTMManager.java:460)
   at 
 com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(XSLTCDTMManager.java:248)
   at 
 com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getDOM(TransformerImpl.java:542)
   ... 14 more
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3975) Document Summarization toolkit, using LSA techniques

2012-10-22 Thread Lance Norskog (JIRA)
Lance Norskog created SOLR-3975:
---

 Summary: Document Summarization toolkit, using LSA techniques
 Key: SOLR-3975
 URL: https://issues.apache.org/jira/browse/SOLR-3975
 Project: Solr
  Issue Type: New Feature
Reporter: Lance Norskog
Priority: Minor
 Attachments: 4.1.summary.patch, reuters.sh

This package analyzes sentences and words as used across sentences to rank the 
most important sentences and words. The general topic is called document 
summarization and is a popular research topic in textual analysis. 

How to use:
1) Check out the 4.x branch, apply the patch, build, and run the solr/example 
instance.
2) Download the first Reuters article corpus from:
http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.tar.gz
3) Unpack this into a directory.
4) Run the attached 'reuters.sh' script:
sh reuters.sh directory http://localhost:8983/solr/collection1
5) Wait several minutes.

Now go to http://localhost:8983/solr/collection1/browse?summary=true and look 
at the large gray box marked 'Document Summary'. This has a table of statistics 
about the analysis, the three most important sentences, and several of the most 
important words in the documents. The sentences have the important tags in 
italics.

The code is packaged as a search component and as an analysis handler. The 
/browse demo uses the search component, and you can also post raw text to  
http://localhost:8983/solr/collection1/analysis/summary. Here is a sample 
command:
curl -s 
http://localhost:8983/solr/analysis/summary?indent=trueechoParams=allfile=$FILEwt=xml;
 --data-binary @$FILE -H 'Content-type:application/xml'

This is an implementation of LSA-based document summarization. A short 
explanation and a long evaluation are described in my blog, [Uncle Lance's 
Ultra Whiz Bang|http://ultrawhizbang.blogspot.com], starting here: 
[http://ultrawhizbang.blogspot.com/2012/09/document-summarization-with-lsa-1.html]



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3975) Document Summarization toolkit, using LSA techniques

2012-10-22 Thread Lance Norskog (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lance Norskog updated SOLR-3975:


Attachment: reuters.sh
4.1.summary.patch

 Document Summarization toolkit, using LSA techniques
 

 Key: SOLR-3975
 URL: https://issues.apache.org/jira/browse/SOLR-3975
 Project: Solr
  Issue Type: New Feature
Reporter: Lance Norskog
Priority: Minor
 Attachments: 4.1.summary.patch, reuters.sh


 This package analyzes sentences and words as used across sentences to rank 
 the most important sentences and words. The general topic is called document 
 summarization and is a popular research topic in textual analysis. 
 How to use:
 1) Check out the 4.x branch, apply the patch, build, and run the solr/example 
 instance.
 2) Download the first Reuters article corpus from:
 http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.tar.gz
 3) Unpack this into a directory.
 4) Run the attached 'reuters.sh' script:
 sh reuters.sh directory http://localhost:8983/solr/collection1
 5) Wait several minutes.
 Now go to http://localhost:8983/solr/collection1/browse?summary=true and look 
 at the large gray box marked 'Document Summary'. This has a table of 
 statistics about the analysis, the three most important sentences, and 
 several of the most important words in the documents. The sentences have the 
 important tags in italics.
 The code is packaged as a search component and as an analysis handler. The 
 /browse demo uses the search component, and you can also post raw text to  
 http://localhost:8983/solr/collection1/analysis/summary. Here is a sample 
 command:
 curl -s 
 http://localhost:8983/solr/analysis/summary?indent=trueechoParams=allfile=$FILEwt=xml;
  --data-binary @$FILE -H 'Content-type:application/xml'
 This is an implementation of LSA-based document summarization. A short 
 explanation and a long evaluation are described in my blog, [Uncle Lance's 
 Ultra Whiz Bang|http://ultrawhizbang.blogspot.com], starting here: 
 [http://ultrawhizbang.blogspot.com/2012/09/document-summarization-with-lsa-1.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3975) Document Summarization toolkit, using LSA techniques

2012-10-22 Thread Lance Norskog (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lance Norskog updated SOLR-3975:


Description: 
This package analyzes sentences and words as used across sentences to rank the 
most important sentences and words. The general topic is called document 
summarization and is a popular research topic in textual analysis. 

How to use:
1) Check out the 4.x branch, apply the patch, build, and run the solr/example 
instance.
2) Download the first Reuters article corpus from:
http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.tar.gz
3) Unpack this into a directory.
4) Run the attached 'reuters.sh' script:
sh reuters.sh directory http://localhost:8983/solr/collection1
5) Wait several minutes.

Now go to http://localhost:8983/solr/collection1/browse?summary=true and look 
at the large gray box marked 'Document Summary'. This has a table of statistics 
about the analysis, the three most important sentences, and several of the most 
important words in the documents. The sentences have the important words in 
italics.

The code is packaged as a search component and as an analysis handler. The 
/browse demo uses the search component, and you can also post raw text to  
http://localhost:8983/solr/collection1/analysis/summary. Here is a sample 
command:
{code}
curl -s 
http://localhost:8983/solr/analysis/summary?indent=trueechoParams=allfile=$FILEwt=xml;
 --data-binary @$FILE -H 'Content-type:application/xml'
{code}

This is an implementation of LSA-based document summarization. A short 
explanation and a long evaluation are described in my blog, [Uncle Lance's 
Ultra Whiz Bang|http://ultrawhizbang.blogspot.com], starting here: 
[http://ultrawhizbang.blogspot.com/2012/09/document-summarization-with-lsa-1.html]



  was:
This package analyzes sentences and words as used across sentences to rank the 
most important sentences and words. The general topic is called document 
summarization and is a popular research topic in textual analysis. 

How to use:
1) Check out the 4.x branch, apply the patch, build, and run the solr/example 
instance.
2) Download the first Reuters article corpus from:
http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.tar.gz
3) Unpack this into a directory.
4) Run the attached 'reuters.sh' script:
sh reuters.sh directory http://localhost:8983/solr/collection1
5) Wait several minutes.

Now go to http://localhost:8983/solr/collection1/browse?summary=true and look 
at the large gray box marked 'Document Summary'. This has a table of statistics 
about the analysis, the three most important sentences, and several of the most 
important words in the documents. The sentences have the important tags in 
italics.

The code is packaged as a search component and as an analysis handler. The 
/browse demo uses the search component, and you can also post raw text to  
http://localhost:8983/solr/collection1/analysis/summary. Here is a sample 
command:
curl -s 
http://localhost:8983/solr/analysis/summary?indent=trueechoParams=allfile=$FILEwt=xml;
 --data-binary @$FILE -H 'Content-type:application/xml'

This is an implementation of LSA-based document summarization. A short 
explanation and a long evaluation are described in my blog, [Uncle Lance's 
Ultra Whiz Bang|http://ultrawhizbang.blogspot.com], starting here: 
[http://ultrawhizbang.blogspot.com/2012/09/document-summarization-with-lsa-1.html]




 Document Summarization toolkit, using LSA techniques
 

 Key: SOLR-3975
 URL: https://issues.apache.org/jira/browse/SOLR-3975
 Project: Solr
  Issue Type: New Feature
Reporter: Lance Norskog
Priority: Minor
 Attachments: 4.1.summary.patch, reuters.sh


 This package analyzes sentences and words as used across sentences to rank 
 the most important sentences and words. The general topic is called document 
 summarization and is a popular research topic in textual analysis. 
 How to use:
 1) Check out the 4.x branch, apply the patch, build, and run the solr/example 
 instance.
 2) Download the first Reuters article corpus from:
 http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.tar.gz
 3) Unpack this into a directory.
 4) Run the attached 'reuters.sh' script:
 sh reuters.sh directory http://localhost:8983/solr/collection1
 5) Wait several minutes.
 Now go to http://localhost:8983/solr/collection1/browse?summary=true and look 
 at the large gray box marked 'Document Summary'. This has a table of 
 statistics about the analysis, the three most important sentences, and 
 several of the most important words in the documents. The sentences have the 
 important words in italics.
 The code is packaged as a search component and as an analysis handler. The 
 /browse demo uses the search component, and you can also post raw text to  
 

[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores

2012-10-22 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481296#comment-13481296
 ] 

Erick Erickson commented on SOLR-1293:
--

Well, I think this JIRA will finally get some action...

Jose: 
The actual availability of any particular feature is best tracked by the actual 
JIRA ticket. The fix version is usually the earliest _possible_ fix. Not 
until the resolution is something like fixed is the code really in the code 
line.

All:
OK, I'm thinking along these lines. I've started implementation, but wanted to 
open up the discussion in case I'm going down the wrong path.

Assumption:
1 For installations with multiple thousands of cores, provision has to me made 
for some kind of administrative process, probably an RDBMS that really 
maintains this information.


So here's a brief outline of the approach I'm thinking about.
1 Add an additional optional parameter to the cores entry in solr.xml, 
LRUCacheSize=#. (what default?)
2 Implement SOLR-1306, allow a data provider to be specified in solr.xml that 
gives back core descriptions, something like: coreDescriptorProvider 
class=com.foo.FooDataProvider [attr=val]/ (don't quite know what attrs we 
want, if any).
3 Add two optional attributes to individual core entries
   a sticky=true|false. Default to true. Any cores marked with this would 
never be aged out, essentially treat them just as current. 
   b loadOnStartup=true|false, default to true.
4 so the process of getting a core would be something like
   a check the normal list, just like now. If a core was found, return it.
   b Check the LRU list, if a core was found, return it.
   c ask the dataprovider (if defined) for the core descriptor. create the 
core and put it in the LRU list.
   d remove any core entries over the LRU limit. Any hints on the right cache 
to use? There's the Lucene LRUCache, ConcurrentLRUCache, the LRUHashMap in 
lucene that I can't find in any of the compiled jars). I've got to close 
the core as it's removed It _looks_ like I can use ConcurrentLRUCache and 
add a listener to close the core when it's removed from the list.

Processing-wise, in the usual case this would cost an extra check each time a 
core was fetched. If a above failed, we would have to see if the dataprovider 
was defined before returning null. I don't think that's onerous, the rest of 
the costs would only be incurred when a dataprovider _did_ exist.

But one design decisions here is along these lines. What to do with persistence 
and stickiness? Specifically, if the coreDescriptorProvider gives us a core 
from, say, an RDBMS, should we allow that core to be persisted into the 
solr.xml file if they've set persist=true in solr.xml? I'm thinking that we 
can make this all work with maximum flexibility if we allow the 
coreDataProvider to tell us whether we should persist any core currently 
loaded

Anyway, I'll be fleshing this out over the next little while, anybody want to 
weigh in?

Erick



 Support for large no:of cores and faster loading/unloading of cores
 ---

 Key: SOLR-1293
 URL: https://issues.apache.org/jira/browse/SOLR-1293
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: SOLR-1293.patch


 Solr , currently ,is not very suitable for a large no:of homogeneous cores 
 where you require fast/frequent loading/unloading of cores . usually a core 
 is required to be loaded just to fire a search query or to just index one 
 document
 The requirements of such a system are.
 * Very efficient loading of cores . Solr cannot afford to read and parse and 
 create Schema, SolrConfig Objects for each core each time the core has to be 
 loaded ( SOLR-919 , SOLR-920)
 * START STOP core . Currently it is only possible to unload a core (SOLR-880)
 * Automatic loading of cores . If a core is present and it is not loaded and 
 a request comes for that load it automatically before serving up a request
 * As there are a large no:of cores , all the cores cannot be kept loaded 
 always. There has to be an upper limit beyond which we need to unload a few 
 cores (probably the least recently used ones)
 * Automatic allotment of dataDir for cores. If the no:of cores is too high al 
 the cores' dataDirs cannot live in the same dir. There is an upper limit on 
 the no:of dirs you can create in a unix dir w/o affecting performance

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org

[jira] [Commented] (LUCENE-4496) Don't decode unnecessary freq blocks in 4.1 codec

2012-10-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481299#comment-13481299
 ] 

Michael McCandless commented on LUCENE-4496:


+1

 Don't decode unnecessary freq blocks in 4.1 codec
 -

 Key: LUCENE-4496
 URL: https://issues.apache.org/jira/browse/LUCENE-4496
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Affects Versions: 4.1
Reporter: Robert Muir
 Attachments: LUCENE-4496.patch, LUCENE-4496.patch


 TermsEnum.docs() has an expert flag to specify you don't require frequencies. 
 This is currently set by some things that don't need it: we should call 
 ForUtil.skipBlock instead of ForUtil.readBlock in this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



entry with enumeration

2012-10-22 Thread avaianobill
Hello, I would like to know if it is possible to automatically create entries
in the index to arrange different terms ( concatenation of the terms)  in a
same entry in the index? The condition to create this enumation for the
terms would be a document property.

For example:

Nodes: [c1,c5]

Employee: c1,c2,c3,c4
Person: c2,c3,c5
Sector: c3,c4,c5

I would like to create this automatically:

Employee###Person: c2,c3
Employee###Sector: c3,c4
Person###Sector: c3,c5




--
View this message in context: 
http://lucene.472066.n3.nabble.com/entry-with-enumeration-tp4015097.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.6.0_35) - Build # 1925 - Failure!

2012-10-22 Thread Policeman Jenkins Server
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/1925/
Java: 32bit/jdk1.6.0_35 -server -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 23483 lines...]
BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:60: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:517: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1937:
 java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:168)
at 
com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293)
at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:331)
at 
com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:863)
at 
com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1203)
at 
com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1230)
at 
com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1214)
at 
sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:434)
at 
sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:166)
at 
sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:133)
at 
org.apache.tools.ant.taskdefs.Get$GetThread.openConnection(Get.java:660)
at org.apache.tools.ant.taskdefs.Get$GetThread.get(Get.java:579)
at org.apache.tools.ant.taskdefs.Get$GetThread.run(Get.java:569)

Total time: 28 minutes 56 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Description set: Java: 32bit/jdk1.6.0_35 -server -XX:+UseParallelGC
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4476) maven deployment scripts dont work (except from the machine you made the RC from)

2012-10-22 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481327#comment-13481327
 ] 

Steven Rowe commented on LUCENE-4476:
-

{quote}
bq. Does this also happen on windows if you sign artifacts with your GPG key?

Definitely not! The password is hidden! This is clearly a cygwin issue (and 
only if you use the cygwin console window). With the official Windows 7 cmd.exe 
in the official Windows console window the password is not shown. I never use 
cygwin for builfding on windows, why do you Steven? To run ANT and build 
artifacts a plain cmd.exe is fine.
{quote}

I agree, Uwe - password hiding with Ant's secure input handler works on Win7 
cmd window for me too.  Definitely a cygwin-specific issue.

I use bash under an Xterm, because I feel like it :) - it's the maximally 
Unix-ish experience on Windows.  Also, when mixing native binaries and Cygwin 
binaries, it's easier to use Cygwin tools to keep everybody happy from 
bash.exe, rather than from cmd.exe.  Also, the Xterm window is resizeable (win 
console has a fixed width) and is more customizable.

{quote}
Ah, also: if you run bash.exe in the official Windows console windows (not 
cygwin's own), it also works. It's a bug of the dumb cygwin-internal console 
window only (why do they have it?) - sorry, I have to rant about Cygwin; I use 
it, too, but only to execute find/sed/grep...
{quote}

(And perl, and python, and .)

Interesting, I hadn't considered running bash under the windows console.  Of 
course C:\cygwin\bin\ would have to be on the path.

I agree the cygwin-internal console window is sucky - I never use it.


 maven deployment scripts dont work (except from the machine you made the RC 
 from)
 -

 Key: LUCENE-4476
 URL: https://issues.apache.org/jira/browse/LUCENE-4476
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4476.patch, LUCENE-4476.patch, LUCENE-4476.patch


 Currently the maven process described in 
 http://wiki.apache.org/lucene-java/PublishMavenArtifacts does not work (on 
 mac)
 It worked fine for the 4.0-alpha and 4.0-beta releases.
 NOTE: This appears to be working on linux so I am going with that. But this 
 seems strange it doesnt work on mac.
  
 {noformat}
 artifact:install-provider] Installing provider: 
 org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-7:runtime
 [artifact:pom] Downloading: 
 org/apache/lucene/lucene-parent/4.0.0/lucene-parent-4.0.0.pom from repository 
 sonatype.releases at http://oss.sonatype.org/content/repositories/releases
 [artifact:pom] Unable to locate resource in repository
 [artifact:pom] [INFO] Unable to find resource 
 'org.apache.lucene:lucene-parent:pom:4.0.0' in repository sonatype.releases 
 (http://oss.sonatype.org/content/repositories/releases)
 [artifact:pom] Downloading: 
 org/apache/lucene/lucene-parent/4.0.0/lucene-parent-4.0.0.pom from repository 
 central at http://repo1.maven.org/maven2
 [artifact:pom] Unable to locate resource in repository
 [artifact:pom] [INFO] Unable to find resource 
 'org.apache.lucene:lucene-parent:pom:4.0.0' in repository central 
 (http://repo1.maven.org/maven2)
 [artifact:pom] An error has occurred while processing the Maven artifact 
 tasks.
 [artifact:pom]  Diagnosis:
 [artifact:pom] 
 [artifact:pom] Unable to initialize POM lucene-test-framework-4.0.0.pom: 
 Cannot find parent: org.apache.lucene:lucene-parent for project: 
 org.apache.lucene:lucene-test-framework:jar:null for project 
 org.apache.lucene:lucene-test-framework:jar:null
 [artifact:pom] Unable to download the artifact from any repository
 BUILD FAILED
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3976) HTMLStripTransformer strips the tika field not the field to index - cannot have both (stripped and unstripped)

2012-10-22 Thread Markus Klose (JIRA)
Markus Klose created SOLR-3976:
--

 Summary: HTMLStripTransformer strips the tika field not the 
field to index - cannot have both (stripped and unstripped)
 Key: SOLR-3976
 URL: https://issues.apache.org/jira/browse/SOLR-3976
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 3.6
Reporter: Markus Klose
Priority: Minor


I run into the situation to index an html file using the dataimport handler and 
got an unexpected output. I wanted to create one field with the original 
content and one field with the same content but without html markup.

If I enaple the HTMLStripTransformer at field text2 the other one (text1) is 
striped as well


example configuraion:

dataConfig
dataSource type=BinFileDataSource name=bin/
document
entity name=f processor=FileListEntityProcessor 
recursive=true rootEntity=false
dataSource=null baseDir= fileName=.*.html
onError=skip 

entity name=tika-test 
processor=TikaEntityProcessor url=${f.fileAbsolutePath}
format=html dataSource=bin onError=skip 
transformer=HTMLStripTransformer,TemplateTransformer

field column=id template=${f.file}/

field column=text name=text1 /
field column=text name=text2 
stripHTML=true/
/entity
/entity
/document
/dataConfig

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3976) HTMLStripTransformer strips the tika field not the field to index - cannot have both (stripped and unstripped)

2012-10-22 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-3976.
--

Resolution: Not A Problem

Please raise this kind of issue on the user's list rather than a JIRA first in 
case it has a simple resolution.

In this case, I'd use a copyField from text1 to text2 in your schema.xml.

 HTMLStripTransformer strips the tika field not the field to index - cannot 
 have both (stripped and unstripped)
 -

 Key: SOLR-3976
 URL: https://issues.apache.org/jira/browse/SOLR-3976
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 3.6
Reporter: Markus Klose
Priority: Minor

 I run into the situation to index an html file using the dataimport handler 
 and got an unexpected output. I wanted to create one field with the original 
 content and one field with the same content but without html markup.
 If I enaple the HTMLStripTransformer at field text2 the other one (text1) is 
 striped as well
 example configuraion:
 dataConfig
   dataSource type=BinFileDataSource name=bin/
   document
   entity name=f processor=FileListEntityProcessor 
 recursive=true rootEntity=false
   dataSource=null baseDir= fileName=.*.html
   onError=skip 
   
   entity name=tika-test 
 processor=TikaEntityProcessor url=${f.fileAbsolutePath}
   format=html dataSource=bin onError=skip 
 transformer=HTMLStripTransformer,TemplateTransformer
   
   field column=id template=${f.file}/
   
   field column=text name=text1 /
   field column=text name=text2 
 stripHTML=true/
   /entity
   /entity
   /document
 /dataConfig

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3976) HTMLStripTransformer strips the tika field not the field to index - cannot have both (stripped and unstripped)

2012-10-22 Thread Markus Klose (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481356#comment-13481356
 ] 

Markus Klose commented on SOLR-3976:


If it sounds like help me to index an html file I am sorry. I just tought 
that is a bug and should be posted here. Please close if necessary.


We creadted a workaround with a sub entity like:

dataConfig
dataSource type=BinFileDataSource name=bin/
document
entity name=f processor=FileListEntityProcessor 
recursive=true rootEntity=false
dataSource=null baseDir=... fileName=.*.html
onError=skip transformer=TemplateTransformer

entity name=tika-test 
processor=TikaEntityProcessor url=${f.fileAbsolutePath}
format=html dataSource=bin onError=skip 
transformer=TemplateTransformer,RegexTransformer,DateFormatTransformer,HTMLStripTransformer

field column=id template=${f.file}/

field column=text name=text1/

entity name=tika2 
processor=TikaEntityProcessor url=${f.fileAbsolutePath}
format=html dataSource=bin 
onError=skip transformer=TemplateTransformer,HTMLStripTransformer
field column=text name=text2 
stripHTML=false/
/entity
/entity
/entity
/document
/dataConfig

 HTMLStripTransformer strips the tika field not the field to index - cannot 
 have both (stripped and unstripped)
 -

 Key: SOLR-3976
 URL: https://issues.apache.org/jira/browse/SOLR-3976
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 3.6
Reporter: Markus Klose
Priority: Minor

 I run into the situation to index an html file using the dataimport handler 
 and got an unexpected output. I wanted to create one field with the original 
 content and one field with the same content but without html markup.
 If I enaple the HTMLStripTransformer at field text2 the other one (text1) is 
 striped as well
 example configuraion:
 dataConfig
   dataSource type=BinFileDataSource name=bin/
   document
   entity name=f processor=FileListEntityProcessor 
 recursive=true rootEntity=false
   dataSource=null baseDir= fileName=.*.html
   onError=skip 
   
   entity name=tika-test 
 processor=TikaEntityProcessor url=${f.fileAbsolutePath}
   format=html dataSource=bin onError=skip 
 transformer=HTMLStripTransformer,TemplateTransformer
   
   field column=id template=${f.file}/
   
   field column=text name=text1 /
   field column=text name=text2 
 stripHTML=true/
   /entity
   /entity
   /document
 /dataConfig

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-880) SolrCore should have a STOP option and a lazy startup option

2012-10-22 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson reassigned SOLR-880:
---

Assignee: Erick Erickson  (was: Shalin Shekhar Mangar)

 SolrCore should have a STOP option and a lazy startup option
 

 Key: SOLR-880
 URL: https://issues.apache.org/jira/browse/SOLR-880
 Project: Solr
  Issue Type: Improvement
  Components: multicore
Reporter: Noble Paul
Assignee: Erick Erickson

 * We must have an option to STOP and START a core. 
 * a core should have an option of loadOnStartup=true|false. default should be 
 true
 * A list command which can give the names of all cores and some meta 
 information like status
 If there are too many cores (tens of thousands) where each of them may be 
 used occassionally, we should not load all of them at once. In the runtime I 
 should be able to STOP and START a core on demand. A listing command would 
 let me know which one is present and what is up and what is down. A stopped 
 core must not use any resource

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-3976) HTMLStripTransformer strips the tika field not the field to index - cannot have both (stripped and unstripped)

2012-10-22 Thread Markus Klose (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481356#comment-13481356
 ] 

Markus Klose edited comment on SOLR-3976 at 10/22/12 1:35 PM:
--

If it sounds like help me to index an html file I am sorry. I just tought 
that is a bug and should be posted here. Please close if necessary.


We creadted a workaround with a sub entity like:

dataConfig
dataSource type=BinFileDataSource name=bin/
document
entity name=f processor=FileListEntityProcessor 
recursive=true rootEntity=false
dataSource=null baseDir=... fileName=.*.html
onError=skip transformer=TemplateTransformer

entity name=tika-test 
processor=TikaEntityProcessor url=${f.fileAbsolutePath}
format=html dataSource=bin onError=skip 
transformer=TemplateTransformer,RegexTransformer,DateFormatTransformer,HTMLStripTransformer

field column=id template=${f.file}/

field column=text name=text1/

entity name=tika2 
processor=TikaEntityProcessor url=${f.fileAbsolutePath}
format=html dataSource=bin 
onError=skip transformer=TemplateTransformer,HTMLStripTransformer
field column=text name=text2 
stripHTML=true/
/entity
/entity
/entity
/document
/dataConfig

  was (Author: markus-klose):
If it sounds like help me to index an html file I am sorry. I just tought 
that is a bug and should be posted here. Please close if necessary.


We creadted a workaround with a sub entity like:

dataConfig
dataSource type=BinFileDataSource name=bin/
document
entity name=f processor=FileListEntityProcessor 
recursive=true rootEntity=false
dataSource=null baseDir=... fileName=.*.html
onError=skip transformer=TemplateTransformer

entity name=tika-test 
processor=TikaEntityProcessor url=${f.fileAbsolutePath}
format=html dataSource=bin onError=skip 
transformer=TemplateTransformer,RegexTransformer,DateFormatTransformer,HTMLStripTransformer

field column=id template=${f.file}/

field column=text name=text1/

entity name=tika2 
processor=TikaEntityProcessor url=${f.fileAbsolutePath}
format=html dataSource=bin 
onError=skip transformer=TemplateTransformer,HTMLStripTransformer
field column=text name=text2 
stripHTML=false/
/entity
/entity
/entity
/document
/dataConfig
  
 HTMLStripTransformer strips the tika field not the field to index - cannot 
 have both (stripped and unstripped)
 -

 Key: SOLR-3976
 URL: https://issues.apache.org/jira/browse/SOLR-3976
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 3.6
Reporter: Markus Klose
Priority: Minor

 I run into the situation to index an html file using the dataimport handler 
 and got an unexpected output. I wanted to create one field with the original 
 content and one field with the same content but without html markup.
 If I enaple the HTMLStripTransformer at field text2 the other one (text1) is 
 striped as well
 example configuraion:
 dataConfig
   dataSource type=BinFileDataSource name=bin/
   document
   entity name=f processor=FileListEntityProcessor 
 recursive=true rootEntity=false
   dataSource=null baseDir= fileName=.*.html
   onError=skip 
   
   entity name=tika-test 
 processor=TikaEntityProcessor url=${f.fileAbsolutePath}
   format=html dataSource=bin onError=skip 
 transformer=HTMLStripTransformer,TemplateTransformer
   
   field column=id template=${f.file}/
   
   field column=text name=text1 /
   field column=text 

[jira] [Assigned] (SOLR-1028) Automatic core loading unloading for multicore

2012-10-22 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson reassigned SOLR-1028:


Assignee: Erick Erickson

 Automatic core loading unloading for multicore
 --

 Key: SOLR-1028
 URL: https://issues.apache.org/jira/browse/SOLR-1028
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Reporter: Noble Paul
Assignee: Erick Erickson
 Fix For: 4.1


 usecase: I have many small cores (say one per user) on a single Solr box . 
 All the cores are not be always needed . But when I need it I should be able 
 to directly issue a search request and the core must be STARTED automatically 
 and the request must be served.
 This also requires that I must have an upper limit on the no:of cores that 
 should be loaded at any given point in time. If the limit is crossed the 
 CoreContainer must unload a core (preferably the least recently used core)  
 There must be a choice of specifying some cores as fixed. These cores must 
 never be unloaded 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores

2012-10-22 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481368#comment-13481368
 ] 

Jack Krupansky commented on SOLR-1293:
--

bq. an RDBMS

Is a full RDBMS needed? How about a NoSQL approach... like... um... Solr (or 
raw Lucene) itself?


 Support for large no:of cores and faster loading/unloading of cores
 ---

 Key: SOLR-1293
 URL: https://issues.apache.org/jira/browse/SOLR-1293
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: SOLR-1293.patch


 Solr , currently ,is not very suitable for a large no:of homogeneous cores 
 where you require fast/frequent loading/unloading of cores . usually a core 
 is required to be loaded just to fire a search query or to just index one 
 document
 The requirements of such a system are.
 * Very efficient loading of cores . Solr cannot afford to read and parse and 
 create Schema, SolrConfig Objects for each core each time the core has to be 
 loaded ( SOLR-919 , SOLR-920)
 * START STOP core . Currently it is only possible to unload a core (SOLR-880)
 * Automatic loading of cores . If a core is present and it is not loaded and 
 a request comes for that load it automatically before serving up a request
 * As there are a large no:of cores , all the cores cannot be kept loaded 
 always. There has to be an upper limit beyond which we need to unload a few 
 cores (probably the least recently used ones)
 * Automatic allotment of dataDir for cores. If the no:of cores is too high al 
 the cores' dataDirs cannot live in the same dir. There is an upper limit on 
 the no:of dirs you can create in a unix dir w/o affecting performance

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4497) Don't write posVIntCount in 4.1 codec

2012-10-22 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-4497:
---

 Summary: Don't write posVIntCount in 4.1 codec
 Key: LUCENE-4497
 URL: https://issues.apache.org/jira/browse/LUCENE-4497
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Robert Muir


Its confusing and unnecessary that we compute this from docFreq for the 
doc/freq vint count, but write it for the positions case: its totalTermFreq % 
BLOCK_SIZE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4497) Don't write posVIntCount in 4.1 codec

2012-10-22 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4497:


Attachment: LUCENE-4497.patch

 Don't write posVIntCount in 4.1 codec
 -

 Key: LUCENE-4497
 URL: https://issues.apache.org/jira/browse/LUCENE-4497
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Robert Muir
 Attachments: LUCENE-4497.patch


 Its confusing and unnecessary that we compute this from docFreq for the 
 doc/freq vint count, but write it for the positions case: its totalTermFreq % 
 BLOCK_SIZE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores

2012-10-22 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481370#comment-13481370
 ] 

Erick Erickson commented on SOLR-1293:
--

I don't care what's used to store the info. The provider that the user provides 
cares, but that's the point of getting that info through a custom component, 
Solr doesn't need to know. Nor should it G... 



 Support for large no:of cores and faster loading/unloading of cores
 ---

 Key: SOLR-1293
 URL: https://issues.apache.org/jira/browse/SOLR-1293
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: SOLR-1293.patch


 Solr , currently ,is not very suitable for a large no:of homogeneous cores 
 where you require fast/frequent loading/unloading of cores . usually a core 
 is required to be loaded just to fire a search query or to just index one 
 document
 The requirements of such a system are.
 * Very efficient loading of cores . Solr cannot afford to read and parse and 
 create Schema, SolrConfig Objects for each core each time the core has to be 
 loaded ( SOLR-919 , SOLR-920)
 * START STOP core . Currently it is only possible to unload a core (SOLR-880)
 * Automatic loading of cores . If a core is present and it is not loaded and 
 a request comes for that load it automatically before serving up a request
 * As there are a large no:of cores , all the cores cannot be kept loaded 
 always. There has to be an upper limit beyond which we need to unload a few 
 cores (probably the least recently used ones)
 * Automatic allotment of dataDir for cores. If the no:of cores is too high al 
 the cores' dataDirs cannot live in the same dir. There is an upper limit on 
 the no:of dirs you can create in a unix dir w/o affecting performance

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores

2012-10-22 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481372#comment-13481372
 ] 

Noble Paul commented on SOLR-1293:
--

Rdbms is not required. We ate managing that with the xml itself.  Now that we 
have moved to zookeeper for cloud, we should piggyback on zookeeper for 
everything

 Support for large no:of cores and faster loading/unloading of cores
 ---

 Key: SOLR-1293
 URL: https://issues.apache.org/jira/browse/SOLR-1293
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: SOLR-1293.patch


 Solr , currently ,is not very suitable for a large no:of homogeneous cores 
 where you require fast/frequent loading/unloading of cores . usually a core 
 is required to be loaded just to fire a search query or to just index one 
 document
 The requirements of such a system are.
 * Very efficient loading of cores . Solr cannot afford to read and parse and 
 create Schema, SolrConfig Objects for each core each time the core has to be 
 loaded ( SOLR-919 , SOLR-920)
 * START STOP core . Currently it is only possible to unload a core (SOLR-880)
 * Automatic loading of cores . If a core is present and it is not loaded and 
 a request comes for that load it automatically before serving up a request
 * As there are a large no:of cores , all the cores cannot be kept loaded 
 always. There has to be an upper limit beyond which we need to unload a few 
 cores (probably the least recently used ones)
 * Automatic allotment of dataDir for cores. If the no:of cores is too high al 
 the cores' dataDirs cannot live in the same dir. There is an upper limit on 
 the no:of dirs you can create in a unix dir w/o affecting performance

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores

2012-10-22 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481372#comment-13481372
 ] 

Noble Paul edited comment on SOLR-1293 at 10/22/12 2:03 PM:


Rdbms is not required. We are managing that with the xml itself.  Now that we 
have moved to zookeeper for cloud, we should piggyback on zookeeper for 
everything

  was (Author: noble.paul):
Rdbms is not required. We ate managing that with the xml itself.  Now that 
we have moved to zookeeper for cloud, we should piggyback on zookeeper for 
everything
  
 Support for large no:of cores and faster loading/unloading of cores
 ---

 Key: SOLR-1293
 URL: https://issues.apache.org/jira/browse/SOLR-1293
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: SOLR-1293.patch


 Solr , currently ,is not very suitable for a large no:of homogeneous cores 
 where you require fast/frequent loading/unloading of cores . usually a core 
 is required to be loaded just to fire a search query or to just index one 
 document
 The requirements of such a system are.
 * Very efficient loading of cores . Solr cannot afford to read and parse and 
 create Schema, SolrConfig Objects for each core each time the core has to be 
 loaded ( SOLR-919 , SOLR-920)
 * START STOP core . Currently it is only possible to unload a core (SOLR-880)
 * Automatic loading of cores . If a core is present and it is not loaded and 
 a request comes for that load it automatically before serving up a request
 * As there are a large no:of cores , all the cores cannot be kept loaded 
 always. There has to be an upper limit beyond which we need to unload a few 
 cores (probably the least recently used ones)
 * Automatic allotment of dataDir for cores. If the no:of cores is too high al 
 the cores' dataDirs cannot live in the same dir. There is an upper limit on 
 the no:of dirs you can create in a unix dir w/o affecting performance

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores

2012-10-22 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481375#comment-13481375
 ] 

Jack Krupansky commented on SOLR-1293:
--

bq. Solr doesn't need to know

True, but what store would you propose using in unit tests? I suppose you could 
develop a Mock RDBMS which could be even simpler than Solr so unit tests 
don't need a solr running.


 Support for large no:of cores and faster loading/unloading of cores
 ---

 Key: SOLR-1293
 URL: https://issues.apache.org/jira/browse/SOLR-1293
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: SOLR-1293.patch


 Solr , currently ,is not very suitable for a large no:of homogeneous cores 
 where you require fast/frequent loading/unloading of cores . usually a core 
 is required to be loaded just to fire a search query or to just index one 
 document
 The requirements of such a system are.
 * Very efficient loading of cores . Solr cannot afford to read and parse and 
 create Schema, SolrConfig Objects for each core each time the core has to be 
 loaded ( SOLR-919 , SOLR-920)
 * START STOP core . Currently it is only possible to unload a core (SOLR-880)
 * Automatic loading of cores . If a core is present and it is not loaded and 
 a request comes for that load it automatically before serving up a request
 * As there are a large no:of cores , all the cores cannot be kept loaded 
 always. There has to be an upper limit beyond which we need to unload a few 
 cores (probably the least recently used ones)
 * Automatic allotment of dataDir for cores. If the no:of cores is too high al 
 the cores' dataDirs cannot live in the same dir. There is an upper limit on 
 the no:of dirs you can create in a unix dir w/o affecting performance

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec

2012-10-22 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-4498:
---

 Summary: pulse docfreq=1 DOCS_ONLY for 4.1 codec
 Key: LUCENE-4498
 URL: https://issues.apache.org/jira/browse/LUCENE-4498
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Robert Muir


We have pulsing codec, but currently this has some downsides:
* its very general, wrapping an arbitrary postingsformat and pulsing everything 
in the postings for an arbitrary docfreq/totalTermFreq cutoff
* reuse is hairy: because it specializes its enums based on these cutoffs, when 
walking thru terms e.g. merging there is a lot of sophisticated stuff to avoid 
the worst cases where we clone indexinputs for tons of terms.

On the other hand the way the 4.1 codec encodes primary key fields is pretty 
silly, we write the docStartFP vlong in the term dictionary metadata, which 
tells us where to seek in the .doc to read our one lonely vint.

I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just 
write the lone doc delta where we would write docStartFP. 

We can avoid the hairy reuse problem too, by just supporting this in 
refillDocs() in BlockDocsEnum instead of specializing.

This would remove the additional seek for primary key fields without really 
any of the downsides of pulsing today.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores

2012-10-22 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481380#comment-13481380
 ] 

Noble Paul commented on SOLR-1293:
--

If you wish to test the zk persistence feature should we just not use an 
embedded zk?

 Support for large no:of cores and faster loading/unloading of cores
 ---

 Key: SOLR-1293
 URL: https://issues.apache.org/jira/browse/SOLR-1293
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: SOLR-1293.patch


 Solr , currently ,is not very suitable for a large no:of homogeneous cores 
 where you require fast/frequent loading/unloading of cores . usually a core 
 is required to be loaded just to fire a search query or to just index one 
 document
 The requirements of such a system are.
 * Very efficient loading of cores . Solr cannot afford to read and parse and 
 create Schema, SolrConfig Objects for each core each time the core has to be 
 loaded ( SOLR-919 , SOLR-920)
 * START STOP core . Currently it is only possible to unload a core (SOLR-880)
 * Automatic loading of cores . If a core is present and it is not loaded and 
 a request comes for that load it automatically before serving up a request
 * As there are a large no:of cores , all the cores cannot be kept loaded 
 always. There has to be an upper limit beyond which we need to unload a few 
 cores (probably the least recently used ones)
 * Automatic allotment of dataDir for cores. If the no:of cores is too high al 
 the cores' dataDirs cannot live in the same dir. There is an upper limit on 
 the no:of dirs you can create in a unix dir w/o affecting performance

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores

2012-10-22 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481386#comment-13481386
 ] 

Jack Krupansky commented on SOLR-1293:
--

bq. piggyback on zookeeper

That's okay, but zk is optimized for a small amount of configuration info - 1 
MB limit. Is large number times data per core going to be under 1 MB?

Is large number supposed to be hundreds, thousands, tens of thousands, 
hundreds of thousands, millions, ...? I mean, if a web site had millions of 
users, could they have one loadable core per user? The use case should be more 
specific about the goals.



 Support for large no:of cores and faster loading/unloading of cores
 ---

 Key: SOLR-1293
 URL: https://issues.apache.org/jira/browse/SOLR-1293
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: SOLR-1293.patch


 Solr , currently ,is not very suitable for a large no:of homogeneous cores 
 where you require fast/frequent loading/unloading of cores . usually a core 
 is required to be loaded just to fire a search query or to just index one 
 document
 The requirements of such a system are.
 * Very efficient loading of cores . Solr cannot afford to read and parse and 
 create Schema, SolrConfig Objects for each core each time the core has to be 
 loaded ( SOLR-919 , SOLR-920)
 * START STOP core . Currently it is only possible to unload a core (SOLR-880)
 * Automatic loading of cores . If a core is present and it is not loaded and 
 a request comes for that load it automatically before serving up a request
 * As there are a large no:of cores , all the cores cannot be kept loaded 
 always. There has to be an upper limit beyond which we need to unload a few 
 cores (probably the least recently used ones)
 * Automatic allotment of dataDir for cores. If the no:of cores is too high al 
 the cores' dataDirs cannot live in the same dir. There is an upper limit on 
 the no:of dirs you can create in a unix dir w/o affecting performance

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Example schema doc omission - omitPositions, omitTermFreqAndPositions, sortMissingFirst, and sortMissingLast

2012-10-22 Thread Jack Krupansky
The Solr example schema says “!-- Valid attributes for fields:”, but omits 
omitPositions, omitTermFreqAndPositions, sortMissingFirst, and sortMissingLast.

It would also be helpful to have a clarifying note that distinguishes 
omitPositions and omitTermFreqAndPositions from termPositions and termVectors. 
I’m not positive, but is it simply that the omitXxx attributes control what 
gets indexed versus the termXxx attributes controlling what is what can be 
retrieved, and that settings of the latter do not influence the former?

-- Jack Krupansky

[jira] [Commented] (LUCENE-4006) system requirements is duplicated across versioned/unversioned

2012-10-22 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481407#comment-13481407
 ] 

Uwe Schindler commented on LUCENE-4006:
---

I committed the changes to the already published versioned 4.0 website (after 
communication with th RM Robert Muir). I will later remove the global docs and 
only refer to the per-version docs. 3.6.1 versioned forrest docs already 
contained the system requirements, so those don't need to be changed.

 system requirements is duplicated across versioned/unversioned
 --

 Key: LUCENE-4006
 URL: https://issues.apache.org/jira/browse/LUCENE-4006
 Project: Lucene - Core
  Issue Type: Task
  Components: general/javadocs
Reporter: Robert Muir
Assignee: Uwe Schindler
 Fix For: 4.1, 5.0, 4.0.1

 Attachments: LUCENE-4006.patch


 Our System requirements page is located here on the unversioned site: 
 http://lucene.apache.org/core/systemreqs.html
 But its also in forrest under each release. Can we just nuke the forrested 
 one?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores

2012-10-22 Thread Andrzej Rusin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481413#comment-13481413
 ] 

Andrzej Rusin commented on SOLR-1293:
-

Whatever would be the storage of the cores info, it would be nice to have some 
API and/or command line tools for (batch) manipulating the cores; what do you 
think?

 Support for large no:of cores and faster loading/unloading of cores
 ---

 Key: SOLR-1293
 URL: https://issues.apache.org/jira/browse/SOLR-1293
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: SOLR-1293.patch


 Solr , currently ,is not very suitable for a large no:of homogeneous cores 
 where you require fast/frequent loading/unloading of cores . usually a core 
 is required to be loaded just to fire a search query or to just index one 
 document
 The requirements of such a system are.
 * Very efficient loading of cores . Solr cannot afford to read and parse and 
 create Schema, SolrConfig Objects for each core each time the core has to be 
 loaded ( SOLR-919 , SOLR-920)
 * START STOP core . Currently it is only possible to unload a core (SOLR-880)
 * Automatic loading of cores . If a core is present and it is not loaded and 
 a request comes for that load it automatically before serving up a request
 * As there are a large no:of cores , all the cores cannot be kept loaded 
 always. There has to be an upper limit beyond which we need to unload a few 
 cores (probably the least recently used ones)
 * Automatic allotment of dataDir for cores. If the no:of cores is too high al 
 the cores' dataDirs cannot live in the same dir. There is an upper limit on 
 the no:of dirs you can create in a unix dir w/o affecting performance

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4499) Multi-word synonym filter (synonym expansion at indexing time).

2012-10-22 Thread roman (JIRA)
roman created LUCENE-4499:
-

 Summary: Multi-word synonym filter (synonym expansion at indexing 
time).
 Key: LUCENE-4499
 URL: https://issues.apache.org/jira/browse/LUCENE-4499
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Affects Versions: 4.1, 5.0
Reporter: roman
Priority: Minor
 Fix For: 5.0


I apologize for bringing the multi-token synonym expansion up again. There is 
an old, unresolved issue at LUCENE-1622 [1]

While solving the problem for our needs [2], I discovered that the current 
SolrSynonym parser (and the wonderful FTS) have almost everything to 
satisfactorily handle both the query and index time synonym expansion. It seems 
that people often need to use the synonym filter *slightly* differently at 
indexing and query time.

In our case, we must do different things during indexing and querying.

Example sentence: Mirrors of the Hubble space telescope pointed at XA5

This is what we need (comma marks position bump):
 
  indexing: mirrors,hubble|hubble space 
telescope|hst,space,telescope,pointed,xa5|astroobject#5
  querying: +mirrors +(hubble space telescope | hst) +pointed 
+(xa5|astroboject#5)
  

This translated to following needs:
  indexing time: 
single-token synonyms = return only synonyms
multi-token synonyms = return original tokens AND the synonyms
 
We need the original tokens for the proximity queries, if we indexed 'hubble 
space telescope'
as one token, we cannot search for 'hubble NEAR telescope'

  query time:
single-token: return only its synonyms (but preserve case)
multi-token: return only synonyms



You may (not) be surprised, but Lucene already supports ALL these requirements. 
The patch is an attempt to state the problem differently. I am not sure if it 
is the best option, however it works perfectly for our needs and it seems it 
could work for general public too. Especially if the SynonymFilterFactory had a 
preconfigured sets of SynonymMapBuilders - and people could just choose what 
situation they use.


links:
[1] https://issues.apache.org/jira/browse/LUCENE-1622
[2] http://labs.adsabs.harvard.edu/trac/ads-invenio/ticket/158
[3] seems to have similar request: 
http://lucene.472066.n3.nabble.com/Proposal-Full-support-for-multi-word-synonyms-at-query-time-td4000522.html




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4499) Multi-word synonym filter (synonym expansion)

2012-10-22 Thread roman (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

roman updated LUCENE-4499:
--

Summary: Multi-word synonym filter (synonym expansion)  (was: Multi-word 
synonym filter (synonym expansion at indexing time).)

 Multi-word synonym filter (synonym expansion)
 -

 Key: LUCENE-4499
 URL: https://issues.apache.org/jira/browse/LUCENE-4499
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Affects Versions: 4.1, 5.0
Reporter: roman
Priority: Minor
  Labels: analysis, multi-word, synonyms
 Fix For: 5.0


 I apologize for bringing the multi-token synonym expansion up again. There is 
 an old, unresolved issue at LUCENE-1622 [1]
 While solving the problem for our needs [2], I discovered that the current 
 SolrSynonym parser (and the wonderful FTS) have almost everything to 
 satisfactorily handle both the query and index time synonym expansion. It 
 seems that people often need to use the synonym filter *slightly* differently 
 at indexing and query time.
 In our case, we must do different things during indexing and querying.
 Example sentence: Mirrors of the Hubble space telescope pointed at XA5
 This is what we need (comma marks position bump):
  
   indexing: mirrors,hubble|hubble space 
 telescope|hst,space,telescope,pointed,xa5|astroobject#5
   querying: +mirrors +(hubble space telescope | hst) +pointed 
 +(xa5|astroboject#5)
   
 This translated to following needs:
   indexing time: 
 single-token synonyms = return only synonyms
 multi-token synonyms = return original tokens AND the synonyms
  
 We need the original tokens for the proximity queries, if we indexed 'hubble 
 space telescope'
 as one token, we cannot search for 'hubble NEAR telescope'
   query time:
 single-token: return only its synonyms (but preserve case)
 multi-token: return only synonyms
 You may (not) be surprised, but Lucene already supports ALL these 
 requirements. The patch is an attempt to state the problem differently. I am 
 not sure if it is the best option, however it works perfectly for our needs 
 and it seems it could work for general public too. Especially if the 
 SynonymFilterFactory had a preconfigured sets of SynonymMapBuilders - and 
 people could just choose what situation they use.
 links:
 [1] https://issues.apache.org/jira/browse/LUCENE-1622
 [2] http://labs.adsabs.harvard.edu/trac/ads-invenio/ticket/158
 [3] seems to have similar request: 
 http://lucene.472066.n3.nabble.com/Proposal-Full-support-for-multi-word-synonyms-at-query-time-td4000522.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4496) Don't decode unnecessary freq blocks in 4.1 codec

2012-10-22 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4496:


Attachment: LUCENE-4496.patch

Same patch, adding a few comments and beefing up TestBlockPostingsFormat3 to 
also check the freqs case.

I'll commit this shortly after running some more tests, and I think I want to 
now yank TestBlockPostingsFormat3 out of this package and let it run with any 
codec, it just tests these various subset cases and isnt specific to this PF.

 Don't decode unnecessary freq blocks in 4.1 codec
 -

 Key: LUCENE-4496
 URL: https://issues.apache.org/jira/browse/LUCENE-4496
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Affects Versions: 4.1
Reporter: Robert Muir
 Attachments: LUCENE-4496.patch, LUCENE-4496.patch, LUCENE-4496.patch


 TermsEnum.docs() has an expert flag to specify you don't require frequencies. 
 This is currently set by some things that don't need it: we should call 
 ForUtil.skipBlock instead of ForUtil.readBlock in this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores

2012-10-22 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481428#comment-13481428
 ] 

Erick Erickson commented on SOLR-1293:
--

Well, I don't think the use-case I'm working on needs an API or command-line 
tools, so I probably won't be working on it. I'd be glad to commit it in if 
someone else wanted to do it.

 Support for large no:of cores and faster loading/unloading of cores
 ---

 Key: SOLR-1293
 URL: https://issues.apache.org/jira/browse/SOLR-1293
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: SOLR-1293.patch


 Solr , currently ,is not very suitable for a large no:of homogeneous cores 
 where you require fast/frequent loading/unloading of cores . usually a core 
 is required to be loaded just to fire a search query or to just index one 
 document
 The requirements of such a system are.
 * Very efficient loading of cores . Solr cannot afford to read and parse and 
 create Schema, SolrConfig Objects for each core each time the core has to be 
 loaded ( SOLR-919 , SOLR-920)
 * START STOP core . Currently it is only possible to unload a core (SOLR-880)
 * Automatic loading of cores . If a core is present and it is not loaded and 
 a request comes for that load it automatically before serving up a request
 * As there are a large no:of cores , all the cores cannot be kept loaded 
 always. There has to be an upper limit beyond which we need to unload a few 
 cores (probably the least recently used ones)
 * Automatic allotment of dataDir for cores. If the no:of cores is too high al 
 the cores' dataDirs cannot live in the same dir. There is an upper limit on 
 the no:of dirs you can create in a unix dir w/o affecting performance

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4499) Multi-word synonym filter (synonym expansion)

2012-10-22 Thread roman (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

roman updated LUCENE-4499:
--

Description: 
I apologize for bringing the multi-token synonym expansion up again. There is 
an old, unresolved issue at LUCENE-1622 [1]

While solving the problem for our needs [2], I discovered that the current 
SolrSynonym parser (and the wonderful FTS) have almost everything to 
satisfactorily handle both the query and index time synonym expansion. It seems 
that people often need to use the synonym filter *slightly* differently at 
indexing and query time.

In our case, we must do different things during indexing and querying.

Example sentence: Mirrors of the Hubble space telescope pointed at XA5

This is what we need (comma marks position bump):


indexing: mirrors,hubble|hubble space 
telescope|hst,space,telescope,pointed,xa5|astroobject#5
querying: +mirrors +(hubble space telescope | hst) +pointed +(xa5|astroboject#5)


This translated to following needs:


  indexing time: 
single-token synonyms = return only synonyms
multi-token synonyms = return original tokens *AND* the synonyms

  query time:
single-token: return only synonyms (but preserve case)
multi-token: return only synonyms
 
We need the original tokens for the proximity queries, if we indexed 'hubble 
space telescope'
as one token, we cannot search for 'hubble NEAR telescope'



You may (not) be surprised, but Lucene already supports ALL of these 
requirements. The patch is an attempt to state the problem differently. I am 
not sure if it is the best option, however it works perfectly for our needs and 
it seems it could work for general public too. Especially if the 
SynonymFilterFactory had a preconfigured sets of SynonymMapBuilders - and 
people would just choose what situation they use. Please look at the unittest.


links:
[1] https://issues.apache.org/jira/browse/LUCENE-1622
[2] http://labs.adsabs.harvard.edu/trac/ads-invenio/ticket/158
[3] seems to have similar request: 
http://lucene.472066.n3.nabble.com/Proposal-Full-support-for-multi-word-synonyms-at-query-time-td4000522.html




  was:
I apologize for bringing the multi-token synonym expansion up again. There is 
an old, unresolved issue at LUCENE-1622 [1]

While solving the problem for our needs [2], I discovered that the current 
SolrSynonym parser (and the wonderful FTS) have almost everything to 
satisfactorily handle both the query and index time synonym expansion. It seems 
that people often need to use the synonym filter *slightly* differently at 
indexing and query time.

In our case, we must do different things during indexing and querying.

Example sentence: Mirrors of the Hubble space telescope pointed at XA5

This is what we need (comma marks position bump):
 
  indexing: mirrors,hubble|hubble space 
telescope|hst,space,telescope,pointed,xa5|astroobject#5
  querying: +mirrors +(hubble space telescope | hst) +pointed 
+(xa5|astroboject#5)
  

This translated to following needs:
  indexing time: 
single-token synonyms = return only synonyms
multi-token synonyms = return original tokens AND the synonyms
 
We need the original tokens for the proximity queries, if we indexed 'hubble 
space telescope'
as one token, we cannot search for 'hubble NEAR telescope'

  query time:
single-token: return only its synonyms (but preserve case)
multi-token: return only synonyms



You may (not) be surprised, but Lucene already supports ALL these requirements. 
The patch is an attempt to state the problem differently. I am not sure if it 
is the best option, however it works perfectly for our needs and it seems it 
could work for general public too. Especially if the SynonymFilterFactory had a 
preconfigured sets of SynonymMapBuilders - and people could just choose what 
situation they use.


links:
[1] https://issues.apache.org/jira/browse/LUCENE-1622
[2] http://labs.adsabs.harvard.edu/trac/ads-invenio/ticket/158
[3] seems to have similar request: 
http://lucene.472066.n3.nabble.com/Proposal-Full-support-for-multi-word-synonyms-at-query-time-td4000522.html





 Multi-word synonym filter (synonym expansion)
 -

 Key: LUCENE-4499
 URL: https://issues.apache.org/jira/browse/LUCENE-4499
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Affects Versions: 4.1, 5.0
Reporter: roman
Priority: Minor
  Labels: analysis, multi-word, synonyms
 Fix For: 5.0


 I apologize for bringing the multi-token synonym expansion up again. There is 
 an old, unresolved issue at LUCENE-1622 [1]
 While solving the problem for our needs [2], I discovered that the current 
 SolrSynonym parser (and the wonderful FTS) have almost everything to 
 satisfactorily handle both the query and index time synonym 

[jira] [Updated] (LUCENE-4499) Multi-word synonym filter (synonym expansion)

2012-10-22 Thread roman (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

roman updated LUCENE-4499:
--

Attachment: LUCENE-4499.patch

patch against latest trunk, i am seeing some unrelated unittests failing

 Multi-word synonym filter (synonym expansion)
 -

 Key: LUCENE-4499
 URL: https://issues.apache.org/jira/browse/LUCENE-4499
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Affects Versions: 4.1, 5.0
Reporter: roman
Priority: Minor
  Labels: analysis, multi-word, synonyms
 Fix For: 5.0

 Attachments: LUCENE-4499.patch


 I apologize for bringing the multi-token synonym expansion up again. There is 
 an old, unresolved issue at LUCENE-1622 [1]
 While solving the problem for our needs [2], I discovered that the current 
 SolrSynonym parser (and the wonderful FTS) have almost everything to 
 satisfactorily handle both the query and index time synonym expansion. It 
 seems that people often need to use the synonym filter *slightly* differently 
 at indexing and query time.
 In our case, we must do different things during indexing and querying.
 Example sentence: Mirrors of the Hubble space telescope pointed at XA5
 This is what we need (comma marks position bump):
 indexing: mirrors,hubble|hubble space 
 telescope|hst,space,telescope,pointed,xa5|astroobject#5
 querying: +mirrors +(hubble space telescope | hst) +pointed 
 +(xa5|astroboject#5)
 This translated to following needs:
   indexing time: 
 single-token synonyms = return only synonyms
 multi-token synonyms = return original tokens *AND* the synonyms
   query time:
 single-token: return only synonyms (but preserve case)
 multi-token: return only synonyms
  
 We need the original tokens for the proximity queries, if we indexed 'hubble 
 space telescope'
 as one token, we cannot search for 'hubble NEAR telescope'
 You may (not) be surprised, but Lucene already supports ALL of these 
 requirements. The patch is an attempt to state the problem differently. I am 
 not sure if it is the best option, however it works perfectly for our needs 
 and it seems it could work for general public too. Especially if the 
 SynonymFilterFactory had a preconfigured sets of SynonymMapBuilders - and 
 people would just choose what situation they use. Please look at the unittest.
 links:
 [1] https://issues.apache.org/jira/browse/LUCENE-1622
 [2] http://labs.adsabs.harvard.edu/trac/ads-invenio/ticket/158
 [3] seems to have similar request: 
 http://lucene.472066.n3.nabble.com/Proposal-Full-support-for-multi-word-synonyms-at-query-time-td4000522.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4496) Don't decode unnecessary freq blocks in 4.1 codec

2012-10-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481437#comment-13481437
 ] 

Robert Muir commented on LUCENE-4496:
-

I committed to trunk... will give it some time in jenkins before backporting.

 Don't decode unnecessary freq blocks in 4.1 codec
 -

 Key: LUCENE-4496
 URL: https://issues.apache.org/jira/browse/LUCENE-4496
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Affects Versions: 4.1
Reporter: Robert Muir
 Attachments: LUCENE-4496.patch, LUCENE-4496.patch, LUCENE-4496.patch


 TermsEnum.docs() has an expert flag to specify you don't require frequencies. 
 This is currently set by some things that don't need it: we should call 
 ForUtil.skipBlock instead of ForUtil.readBlock in this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4497) Don't write posVIntCount in 4.1 codec

2012-10-22 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4497:


Attachment: LUCENE-4497.patch

updated patch to trunk. This is actually a nice little savings to the positions 
file with the luceneutil 1M collection.

trunk: 116425749 bytes
patch: 111340216 bytes


 Don't write posVIntCount in 4.1 codec
 -

 Key: LUCENE-4497
 URL: https://issues.apache.org/jira/browse/LUCENE-4497
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Robert Muir
 Attachments: LUCENE-4497.patch, LUCENE-4497.patch


 Its confusing and unnecessary that we compute this from docFreq for the 
 doc/freq vint count, but write it for the positions case: its totalTermFreq % 
 BLOCK_SIZE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec

2012-10-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481480#comment-13481480
 ] 

Robert Muir commented on LUCENE-4498:
-

I will work on a patch after LUCENE-4497 has been reviewed... ive already 
conflicted myself with this PF today :)

 pulse docfreq=1 DOCS_ONLY for 4.1 codec
 ---

 Key: LUCENE-4498
 URL: https://issues.apache.org/jira/browse/LUCENE-4498
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Robert Muir

 We have pulsing codec, but currently this has some downsides:
 * its very general, wrapping an arbitrary postingsformat and pulsing 
 everything in the postings for an arbitrary docfreq/totalTermFreq cutoff
 * reuse is hairy: because it specializes its enums based on these cutoffs, 
 when walking thru terms e.g. merging there is a lot of sophisticated stuff to 
 avoid the worst cases where we clone indexinputs for tons of terms.
 On the other hand the way the 4.1 codec encodes primary key fields is 
 pretty silly, we write the docStartFP vlong in the term dictionary metadata, 
 which tells us where to seek in the .doc to read our one lonely vint.
 I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just 
 write the lone doc delta where we would write docStartFP. 
 We can avoid the hairy reuse problem too, by just supporting this in 
 refillDocs() in BlockDocsEnum instead of specializing.
 This would remove the additional seek for primary key fields without really 
 any of the downsides of pulsing today.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4497) Don't write posVIntCount in 4.1 codec

2012-10-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481488#comment-13481488
 ] 

Michael McCandless commented on LUCENE-4497:


+1, nice!

 Don't write posVIntCount in 4.1 codec
 -

 Key: LUCENE-4497
 URL: https://issues.apache.org/jira/browse/LUCENE-4497
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Robert Muir
 Attachments: LUCENE-4497.patch, LUCENE-4497.patch


 Its confusing and unnecessary that we compute this from docFreq for the 
 doc/freq vint count, but write it for the positions case: its totalTermFreq % 
 BLOCK_SIZE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec

2012-10-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481514#comment-13481514
 ] 

Michael McCandless commented on LUCENE-4498:


+1

 pulse docfreq=1 DOCS_ONLY for 4.1 codec
 ---

 Key: LUCENE-4498
 URL: https://issues.apache.org/jira/browse/LUCENE-4498
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Robert Muir

 We have pulsing codec, but currently this has some downsides:
 * its very general, wrapping an arbitrary postingsformat and pulsing 
 everything in the postings for an arbitrary docfreq/totalTermFreq cutoff
 * reuse is hairy: because it specializes its enums based on these cutoffs, 
 when walking thru terms e.g. merging there is a lot of sophisticated stuff to 
 avoid the worst cases where we clone indexinputs for tons of terms.
 On the other hand the way the 4.1 codec encodes primary key fields is 
 pretty silly, we write the docStartFP vlong in the term dictionary metadata, 
 which tells us where to seek in the .doc to read our one lonely vint.
 I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just 
 write the lone doc delta where we would write docStartFP. 
 We can avoid the hairy reuse problem too, by just supporting this in 
 refillDocs() in BlockDocsEnum instead of specializing.
 This would remove the additional seek for primary key fields without really 
 any of the downsides of pulsing today.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores

2012-10-22 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481536#comment-13481536
 ] 

Noble Paul commented on SOLR-1293:
--

bq. Is large number supposed to be hundreds, thousands, tens of thousands, 
hundreds of thousands, millions, ...?

I'll be surprised if it ever crosses a few 1's . But let us say the upper 
limit sa a 10 , shouldn't it be simple to keep in ZK?



 Support for large no:of cores and faster loading/unloading of cores
 ---

 Key: SOLR-1293
 URL: https://issues.apache.org/jira/browse/SOLR-1293
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: SOLR-1293.patch


 Solr , currently ,is not very suitable for a large no:of homogeneous cores 
 where you require fast/frequent loading/unloading of cores . usually a core 
 is required to be loaded just to fire a search query or to just index one 
 document
 The requirements of such a system are.
 * Very efficient loading of cores . Solr cannot afford to read and parse and 
 create Schema, SolrConfig Objects for each core each time the core has to be 
 loaded ( SOLR-919 , SOLR-920)
 * START STOP core . Currently it is only possible to unload a core (SOLR-880)
 * Automatic loading of cores . If a core is present and it is not loaded and 
 a request comes for that load it automatically before serving up a request
 * As there are a large no:of cores , all the cores cannot be kept loaded 
 always. There has to be an upper limit beyond which we need to unload a few 
 cores (probably the least recently used ones)
 * Automatic allotment of dataDir for cores. If the no:of cores is too high al 
 the cores' dataDirs cannot live in the same dir. There is an upper limit on 
 the no:of dirs you can create in a unix dir w/o affecting performance

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4497) Don't write posVIntCount in 4.1 codec

2012-10-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481551#comment-13481551
 ] 

Robert Muir commented on LUCENE-4497:
-

I committed to trunk. will bake for a bit before backporting.

 Don't write posVIntCount in 4.1 codec
 -

 Key: LUCENE-4497
 URL: https://issues.apache.org/jira/browse/LUCENE-4497
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Robert Muir
 Attachments: LUCENE-4497.patch, LUCENE-4497.patch


 Its confusing and unnecessary that we compute this from docFreq for the 
 doc/freq vint count, but write it for the positions case: its totalTermFreq % 
 BLOCK_SIZE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec

2012-10-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481553#comment-13481553
 ] 

Robert Muir commented on LUCENE-4498:
-

Actually I think for the other cases (not just DOCS_ONLY) we can pulse when 
totalTermFreq=1, as the freq is implicit.
We can just leave the positions and what not where they are.

I'll see how ugly it is...

 pulse docfreq=1 DOCS_ONLY for 4.1 codec
 ---

 Key: LUCENE-4498
 URL: https://issues.apache.org/jira/browse/LUCENE-4498
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Robert Muir

 We have pulsing codec, but currently this has some downsides:
 * its very general, wrapping an arbitrary postingsformat and pulsing 
 everything in the postings for an arbitrary docfreq/totalTermFreq cutoff
 * reuse is hairy: because it specializes its enums based on these cutoffs, 
 when walking thru terms e.g. merging there is a lot of sophisticated stuff to 
 avoid the worst cases where we clone indexinputs for tons of terms.
 On the other hand the way the 4.1 codec encodes primary key fields is 
 pretty silly, we write the docStartFP vlong in the term dictionary metadata, 
 which tells us where to seek in the .doc to read our one lonely vint.
 I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just 
 write the lone doc delta where we would write docStartFP. 
 We can avoid the hairy reuse problem too, by just supporting this in 
 refillDocs() in BlockDocsEnum instead of specializing.
 This would remove the additional seek for primary key fields without really 
 any of the downsides of pulsing today.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



heads up: reindex trunk indexes

2012-10-22 Thread Robert Muir
I committed https://issues.apache.org/jira/browse/LUCENE-4497. You
should reindex

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3977) Add [* TO *] option to spatial fields.

2012-10-22 Thread David Smiley (JIRA)
David Smiley created SOLR-3977:
--

 Summary: Add [* TO *] option to spatial fields.
 Key: SOLR-3977
 URL: https://issues.apache.org/jira/browse/SOLR-3977
 Project: Solr
  Issue Type: New Feature
Reporter: David Smiley
Priority: Minor


It would be nice to have [* TO *] work on a spatial field.  Not necessarily any 
range query but this specific one.  I don't know if there are other non-spatial 
fields where this won't work, but it'd be nice if this was universal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2305) DataImportScheduler

2012-10-22 Thread Marko Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481621#comment-13481621
 ] 

Marko Bonaci commented on SOLR-2305:


[~otis]
Got it! Will do...

 DataImportScheduler
 ---

 Key: SOLR-2305
 URL: https://issues.apache.org/jira/browse/SOLR-2305
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.0-ALPHA
Reporter: Bill Bell
 Fix For: 4.1

 Attachments: patch.txt, SOLR-2305-1.diff


 Marko Bonaci has updated the WIKI page to add the DataImportScheduler, but I 
 cannot find a JIRA ticket for it?
 http://wiki.apache.org/solr/DataImportHandler
 Do we have a ticket so the code can be tracked?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2012-10-22 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated SOLR-1972:


Attachment: SOLR-1972_metrics.patch

 Need additional query stats in admin interface - median, 95th and 99th 
 percentile
 -

 Key: SOLR-1972
 URL: https://issues.apache.org/jira/browse/SOLR-1972
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Shawn Heisey
Priority: Minor
 Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
 elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
 SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
 SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972.patch, 
 SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972-url_pattern.patch


 I would like to see more detailed query statistics from the admin GUI.  This 
 is what you can get now:
 requests : 809
 errors : 0
 timeouts : 0
 totalTime : 70053
 avgTimePerRequest : 86.59209
 avgRequestsPerSecond : 0.8148785 
 I'd like to see more data on the time per request - median, 95th percentile, 
 99th percentile, and any other statistical function that makes sense to 
 include.  In my environment, the first bunch of queries after startup tend to 
 take several seconds each.  I find that the average value tends to be useless 
 until it has several thousand queries under its belt and the caches are 
 thoroughly warmed.  The statistical functions I have mentioned would quickly 
 eliminate the influence of those initial slow queries.
 The system will have to store individual data about each query.  I don't know 
 if this is something Solr does already.  It would be nice to have a 
 configurable count of how many of the most recent data points are kept, to 
 control the amount of memory the feature uses.  The default value could be 
 something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec

2012-10-22 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4498:


Attachment: LUCENE-4498.patch

Initial patch (no file format docs yet, lets benchmark/measure first).

All tests pass.

 pulse docfreq=1 DOCS_ONLY for 4.1 codec
 ---

 Key: LUCENE-4498
 URL: https://issues.apache.org/jira/browse/LUCENE-4498
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Robert Muir
 Attachments: LUCENE-4498.patch


 We have pulsing codec, but currently this has some downsides:
 * its very general, wrapping an arbitrary postingsformat and pulsing 
 everything in the postings for an arbitrary docfreq/totalTermFreq cutoff
 * reuse is hairy: because it specializes its enums based on these cutoffs, 
 when walking thru terms e.g. merging there is a lot of sophisticated stuff to 
 avoid the worst cases where we clone indexinputs for tons of terms.
 On the other hand the way the 4.1 codec encodes primary key fields is 
 pretty silly, we write the docStartFP vlong in the term dictionary metadata, 
 which tells us where to seek in the .doc to read our one lonely vint.
 I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just 
 write the lone doc delta where we would write docStartFP. 
 We can avoid the hairy reuse problem too, by just supporting this in 
 refillDocs() in BlockDocsEnum instead of specializing.
 This would remove the additional seek for primary key fields without really 
 any of the downsides of pulsing today.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2012-10-22 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481655#comment-13481655
 ] 

Alan Woodward commented on SOLR-1972:
-

Here's a patch that uses the metrics library.  It doesn't include Eric's regex 
matching or anything at the moment - it basically just takes what's currently 
in trunk, refactors it to use metrics' Counter and Timer objects, and adds the 
rolling average data.

Cons:
  - it adds another dependency to solr-core.  It's a useful dependency, IMO, 
but still.
  - tests don't pass at the moment, as metrics spawns extra threads which the 
test runner doesn't know how to deal with

Pros:
  - it's a purpose-designed stats and metrics library, so we don't need to 
worry about the maths or sampling algorithms
  - it adds the functionality of the original ticket/patch in a much simpler 
way.

The ideal solution would be a component of some kind, I think, but this at 
least improves on what's in trunk at the moment.  

 Need additional query stats in admin interface - median, 95th and 99th 
 percentile
 -

 Key: SOLR-1972
 URL: https://issues.apache.org/jira/browse/SOLR-1972
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Shawn Heisey
Priority: Minor
 Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
 elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
 SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
 SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972.patch, 
 SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972-url_pattern.patch


 I would like to see more detailed query statistics from the admin GUI.  This 
 is what you can get now:
 requests : 809
 errors : 0
 timeouts : 0
 totalTime : 70053
 avgTimePerRequest : 86.59209
 avgRequestsPerSecond : 0.8148785 
 I'd like to see more data on the time per request - median, 95th percentile, 
 99th percentile, and any other statistical function that makes sense to 
 include.  In my environment, the first bunch of queries after startup tend to 
 take several seconds each.  I find that the average value tends to be useless 
 until it has several thousand queries under its belt and the caches are 
 thoroughly warmed.  The statistical functions I have mentioned would quickly 
 eliminate the influence of those initial slow queries.
 The system will have to store individual data about each query.  I don't know 
 if this is something Solr does already.  It would be nice to have a 
 configurable count of how many of the most recent data points are kept, to 
 control the amount of memory the feature uses.  The default value could be 
 something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec

2012-10-22 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4498:


Attachment: LUCENE-4498.patch

duh I forgot to actually not seek in the previous patch: here's the updated 
patch.

 pulse docfreq=1 DOCS_ONLY for 4.1 codec
 ---

 Key: LUCENE-4498
 URL: https://issues.apache.org/jira/browse/LUCENE-4498
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Robert Muir
 Attachments: LUCENE-4498.patch, LUCENE-4498.patch


 We have pulsing codec, but currently this has some downsides:
 * its very general, wrapping an arbitrary postingsformat and pulsing 
 everything in the postings for an arbitrary docfreq/totalTermFreq cutoff
 * reuse is hairy: because it specializes its enums based on these cutoffs, 
 when walking thru terms e.g. merging there is a lot of sophisticated stuff to 
 avoid the worst cases where we clone indexinputs for tons of terms.
 On the other hand the way the 4.1 codec encodes primary key fields is 
 pretty silly, we write the docStartFP vlong in the term dictionary metadata, 
 which tells us where to seek in the .doc to read our one lonely vint.
 I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just 
 write the lone doc delta where we would write docStartFP. 
 We can avoid the hairy reuse problem too, by just supporting this in 
 refillDocs() in BlockDocsEnum instead of specializing.
 This would remove the additional seek for primary key fields without really 
 any of the downsides of pulsing today.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError

2012-10-22 Thread Aaron Daubman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481696#comment-13481696
 ] 

Aaron Daubman commented on SOLR-3849:
-

This appears to still be affecting me in 4_0_0 (1400746)
Running under OS X 10.8.2 with $ java -version  
java version 1.7.0_09
Java(TM) SE Runtime Environment (build 1.7.0_09-b05)
Java HotSpot(TM) 64-Bit Server VM (build 23.5-b02, mixed mode)

---snip---
$ ant test -Dtestcase=ScriptEngineTest
...
common.test:
[junit4:junit4] JUnit4 says مرحبا! Master seed: 4050036B906720D2
[junit4:junit4] Executing 1 suite with 1 JVM.
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest
[junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby
[junit4:junit4] Assumption #1: got: [null], expected: each not null
[junit4:junit4] OK  0.11s | ScriptEngineTest.testPut
[junit4:junit4] OK  0.01s | ScriptEngineTest.testEvalReader
[junit4:junit4] OK  0.01s | ScriptEngineTest.testEvalText
[junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByExtension
[junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByName
[junit4:junit4]   2 NOTE: test params are: codec=Lucene3x, 
sim=RandomSimilarityProvider(queryNorm=true,coord=crazy): {}, locale=es_DO, 
timezone=America/Godthab
[junit4:junit4]   2 NOTE: Mac OS X 10.8.2 x86_64/Oracle Corporation 1.7.0_09 
(64-bit)/cpus=8,threads=1,free=2966056,total=12320768
[junit4:junit4]   2 NOTE: All tests run in this JVM: [ScriptEngineTest]
[junit4:junit4]   2 NOTE: reproduce with: ant test  
-Dtestcase=ScriptEngineTest -Dtests.seed=4050036B906720D2 -Dtests.slow=true 
-Dtests.locale=es_DO -Dtests.timezone=America/Godthab 
-Dtests.file.encoding=US-ASCII
[junit4:junit4] ERROR   0.00s | ScriptEngineTest (suite) 
[junit4:junit4] Throwable #1: java.lang.AssertionError: System properties 
invariant violated.
[junit4:junit4] New keys:
[junit4:junit4]   sun.awt.enableExtraMouseButtons=true
[junit4:junit4]   sun.font.fontmanager=sun.font.CFontManager
[junit4:junit4] 
[junit4:junit4]at 
__randomizedtesting.SeedInfo.seed([4050036B906720D2]:0)
[junit4:junit4]at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66)
[junit4:junit4]at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
[junit4:junit4]at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
[junit4:junit4]at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
[junit4:junit4]at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
[junit4:junit4]at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
[junit4:junit4]at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
[junit4:junit4]at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
[junit4:junit4]at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
[junit4:junit4]at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
[junit4:junit4]at java.lang.Thread.run(Thread.java:722)
[junit4:junit4] Completed in 1.14s, 6 tests, 1 failure, 1 skipped  FAILURES!
[junit4:junit4] 
[junit4:junit4] 
[junit4:junit4] Tests with failures:
[junit4:junit4]   - org.apache.solr.update.processor.ScriptEngineTest (suite)
[junit4:junit4] 
[junit4:junit4] 
[junit4:junit4] JVM J0: 0.92 .. 2.86 = 1.94s
[junit4:junit4] Execution time total: 2.96 sec.
[junit4:junit4] Tests summary: 1 suite, 6 tests, 1 suite-level error, 1 ignored 
(1 assumption)

BUILD FAILED
/Users/adaubman/Projects/lucene_solr_4_0_0/build.xml:40: The following error 
occurred while executing this line:
/Users/adaubman/Projects/lucene_solr_4_0_0/solr/build.xml:179: The following 
error occurred while executing this line:
/Users/adaubman/Projects/lucene_solr_4_0_0/lucene/module-build.xml:63: The 
following error occurred while executing this line:
/Users/adaubman/Projects/lucene_solr_4_0_0/lucene/common-build.xml:1142: The 
following error occurred while executing this line:
/Users/adaubman/Projects/lucene_solr_4_0_0/lucene/common-build.xml:815: There 
were test failures: 1 suite, 6 tests, 1 suite-level error, 1 ignored (1 
assumption)

Total time: 24 seconds
---snip---

 ScriptEngineTest failure RE system properties and ThreadLeakError
 -

 Key: SOLR-3849
 URL: 

[jira] [Updated] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec

2012-10-22 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4498:


Attachment: LUCENE-4498_lazy.patch

Here is a patch with a lazy clone() of the docsenum, e.g. when someone isnt 
reusing docsenum like doing termqueries or whatever, they won't pay the price 
of NIOFS buffer reads etc just for a primary key.

 pulse docfreq=1 DOCS_ONLY for 4.1 codec
 ---

 Key: LUCENE-4498
 URL: https://issues.apache.org/jira/browse/LUCENE-4498
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Robert Muir
 Attachments: LUCENE-4498_lazy.patch, LUCENE-4498.patch, 
 LUCENE-4498.patch


 We have pulsing codec, but currently this has some downsides:
 * its very general, wrapping an arbitrary postingsformat and pulsing 
 everything in the postings for an arbitrary docfreq/totalTermFreq cutoff
 * reuse is hairy: because it specializes its enums based on these cutoffs, 
 when walking thru terms e.g. merging there is a lot of sophisticated stuff to 
 avoid the worst cases where we clone indexinputs for tons of terms.
 On the other hand the way the 4.1 codec encodes primary key fields is 
 pretty silly, we write the docStartFP vlong in the term dictionary metadata, 
 which tells us where to seek in the .doc to read our one lonely vint.
 I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just 
 write the lone doc delta where we would write docStartFP. 
 We can avoid the hairy reuse problem too, by just supporting this in 
 refillDocs() in BlockDocsEnum instead of specializing.
 This would remove the additional seek for primary key fields without really 
 any of the downsides of pulsing today.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError

2012-10-22 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481716#comment-13481716
 ] 

Steven Rowe commented on SOLR-3849:
---

I see the exact same failure on OS X 10.8.2 w/ Java 1.7.0_07.  However, this 
test succeeds w/ Java 1.6.0_37.

 ScriptEngineTest failure RE system properties and ThreadLeakError
 -

 Key: SOLR-3849
 URL: https://issues.apache.org/jira/browse/SOLR-3849
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 5.0
 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
Reporter: David Smiley
Assignee: Uwe Schindler
 Fix For: 4.0, 5.0

 Attachments: SOLR-3849.patch


 100% reproducible for me:
 solr$ ant test  -Dtestcase=ScriptEngineTest
 {noformat}
 [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4
 [junit4:junit4] Executing 1 suite with 1 JVM.
 [junit4:junit4] 
 [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest
 [junit4:junit4] OK  0.17s | ScriptEngineTest.testPut
 [junit4:junit4] OK  0.02s | ScriptEngineTest.testEvalReader
 [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby
 [junit4:junit4] Assumption #1: got: [null], expected: each not null
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testEvalText
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByExtension
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByName
 [junit4:junit4]   2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING 
 Will linger awaiting termination of 2 leaked thread(s).
 [junit4:junit4]   2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 
 thread leaked from SUITE scope at 
 org.apache.solr.update.processor.ScriptEngineTest: 
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll 
 Starting to interrupt leaked threads:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE 
 There are still zombie threads that couldn't be terminated:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 NOTE: test params are: codec=SimpleText, 
 sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, 
 timezone=America/Edmonton
 [junit4:junit4]   2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
 [junit4:junit4]   2 NOTE: All tests run in this JVM: [ScriptEngineTest]
 [junit4:junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true 
 -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton 
 -Dtests.file.encoding=UTF-8
 [junit4:junit4] ERROR   0.00s | ScriptEngineTest (suite) 
 [junit4:junit4] Throwable #1: java.lang.AssertionError: System 
 properties invariant violated.
 [junit4:junit4] New keys:
 [junit4:junit4]   sun.awt.enableExtraMouseButtons=true
 [junit4:junit4]   sun.font.fontmanager=sun.font.CFontManager
 [junit4:junit4] 
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 [junit4:junit4]  at java.lang.Thread.run(Thread.java:722)
 [junit4:junit4] 

[jira] [Created] (LUCENE-4500) Loosen up DirectSpellChecker's minPrefix requirements

2012-10-22 Thread Erik Hatcher (JIRA)
Erik Hatcher created LUCENE-4500:


 Summary: Loosen up DirectSpellChecker's minPrefix requirements
 Key: LUCENE-4500
 URL: https://issues.apache.org/jira/browse/LUCENE-4500
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Erik Hatcher
Priority: Minor


DirectSpellChecker currently mandates a minPrefix of 1 when editDistance=2.  
This prohibits a query of nusglasses from matching the indexed sunglasses 
term.

Granted, there can be performance issues with using a minPrefix of 0, but it's 
a risk that a user should be allowed to take if needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError

2012-10-22 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481721#comment-13481721
 ] 

Dawid Weiss commented on SOLR-3849:
---

Interesting. We could ignore those properties but they indicate that an AWT 
daemon was for some reason startup up and messed up system properties. Uwe may 
want to kill it rather than just ignoring these props.

 ScriptEngineTest failure RE system properties and ThreadLeakError
 -

 Key: SOLR-3849
 URL: https://issues.apache.org/jira/browse/SOLR-3849
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 5.0
 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
Reporter: David Smiley
Assignee: Uwe Schindler
 Fix For: 4.0, 5.0

 Attachments: SOLR-3849.patch


 100% reproducible for me:
 solr$ ant test  -Dtestcase=ScriptEngineTest
 {noformat}
 [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4
 [junit4:junit4] Executing 1 suite with 1 JVM.
 [junit4:junit4] 
 [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest
 [junit4:junit4] OK  0.17s | ScriptEngineTest.testPut
 [junit4:junit4] OK  0.02s | ScriptEngineTest.testEvalReader
 [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby
 [junit4:junit4] Assumption #1: got: [null], expected: each not null
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testEvalText
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByExtension
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByName
 [junit4:junit4]   2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING 
 Will linger awaiting termination of 2 leaked thread(s).
 [junit4:junit4]   2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 
 thread leaked from SUITE scope at 
 org.apache.solr.update.processor.ScriptEngineTest: 
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll 
 Starting to interrupt leaked threads:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE 
 There are still zombie threads that couldn't be terminated:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 NOTE: test params are: codec=SimpleText, 
 sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, 
 timezone=America/Edmonton
 [junit4:junit4]   2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
 [junit4:junit4]   2 NOTE: All tests run in this JVM: [ScriptEngineTest]
 [junit4:junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true 
 -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton 
 -Dtests.file.encoding=UTF-8
 [junit4:junit4] ERROR   0.00s | ScriptEngineTest (suite) 
 [junit4:junit4] Throwable #1: java.lang.AssertionError: System 
 properties invariant violated.
 [junit4:junit4] New keys:
 [junit4:junit4]   sun.awt.enableExtraMouseButtons=true
 [junit4:junit4]   sun.font.fontmanager=sun.font.CFontManager
 [junit4:junit4] 
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 

[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError

2012-10-22 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481722#comment-13481722
 ] 

Uwe Schindler commented on SOLR-3849:
-

Do anybody of you maybe have a custom scriptng engine in classpath? This could 
cause some bootup of some non-JDK script environment bootup that modifies those 
system variables. Maybe Apple/Macintosh has some 
CrazyUselessAsAlwaysMäcintrashEngine shipped by default.

 ScriptEngineTest failure RE system properties and ThreadLeakError
 -

 Key: SOLR-3849
 URL: https://issues.apache.org/jira/browse/SOLR-3849
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 5.0
 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
Reporter: David Smiley
Assignee: Uwe Schindler
 Fix For: 4.0, 5.0

 Attachments: SOLR-3849.patch


 100% reproducible for me:
 solr$ ant test  -Dtestcase=ScriptEngineTest
 {noformat}
 [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4
 [junit4:junit4] Executing 1 suite with 1 JVM.
 [junit4:junit4] 
 [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest
 [junit4:junit4] OK  0.17s | ScriptEngineTest.testPut
 [junit4:junit4] OK  0.02s | ScriptEngineTest.testEvalReader
 [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby
 [junit4:junit4] Assumption #1: got: [null], expected: each not null
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testEvalText
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByExtension
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByName
 [junit4:junit4]   2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING 
 Will linger awaiting termination of 2 leaked thread(s).
 [junit4:junit4]   2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 
 thread leaked from SUITE scope at 
 org.apache.solr.update.processor.ScriptEngineTest: 
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll 
 Starting to interrupt leaked threads:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE 
 There are still zombie threads that couldn't be terminated:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 NOTE: test params are: codec=SimpleText, 
 sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, 
 timezone=America/Edmonton
 [junit4:junit4]   2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
 [junit4:junit4]   2 NOTE: All tests run in this JVM: [ScriptEngineTest]
 [junit4:junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true 
 -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton 
 -Dtests.file.encoding=UTF-8
 [junit4:junit4] ERROR   0.00s | ScriptEngineTest (suite) 
 [junit4:junit4] Throwable #1: java.lang.AssertionError: System 
 properties invariant violated.
 [junit4:junit4] New keys:
 [junit4:junit4]   sun.awt.enableExtraMouseButtons=true
 [junit4:junit4]   sun.font.fontmanager=sun.font.CFontManager
 [junit4:junit4] 
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 

[jira] [Commented] (LUCENE-4500) Loosen up DirectSpellChecker's minPrefix requirements

2012-10-22 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481723#comment-13481723
 ] 

Erik Hatcher commented on LUCENE-4500:
--

This patch to DirectSpellChecker does the trick (using accuracy=0.8 or less in 
the description example):

{code}
-FuzzyTermsEnum e = new FuzzyTermsEnum(terms, atts, term, editDistance, 
Math.max(minPrefix, editDistance-1), true);
+FuzzyTermsEnum e = new FuzzyTermsEnum(terms, atts, term, editDistance, 
minPrefix, true);
{code}

In a conversation with Robert Muir, we agreed that this, rather, should keep 
the default that restricts to minPrefix=1 when editDistance=2, but made 
optional to use a minPrefix=0.

 Loosen up DirectSpellChecker's minPrefix requirements
 -

 Key: LUCENE-4500
 URL: https://issues.apache.org/jira/browse/LUCENE-4500
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Erik Hatcher
Priority: Minor

 DirectSpellChecker currently mandates a minPrefix of 1 when editDistance=2.  
 This prohibits a query of nusglasses from matching the indexed sunglasses 
 term.
 Granted, there can be performance issues with using a minPrefix of 0, but 
 it's a risk that a user should be allowed to take if needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4500) Loosen up DirectSpellChecker's minPrefix requirements

2012-10-22 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481723#comment-13481723
 ] 

Erik Hatcher edited comment on LUCENE-4500 at 10/22/12 7:49 PM:


This patch to DirectSpellChecker does the trick (using accuracy=0.8 or less in 
the description example):

{code}
-FuzzyTermsEnum e = new FuzzyTermsEnum(terms, atts, term, editDistance, 
Math.max(minPrefix, editDistance-1), true);
+FuzzyTermsEnum e = new FuzzyTermsEnum(terms, atts, term, editDistance, 
minPrefix, true);
{code}

In a conversation with Robert Muir, we agreed that this, rather, should keep 
the default that restricts to minPrefix=1 when editDistance=2, but made 
optional to allow using a minPrefix=0.

  was (Author: ehatcher):
This patch to DirectSpellChecker does the trick (using accuracy=0.8 or less 
in the description example):

{code}
-FuzzyTermsEnum e = new FuzzyTermsEnum(terms, atts, term, editDistance, 
Math.max(minPrefix, editDistance-1), true);
+FuzzyTermsEnum e = new FuzzyTermsEnum(terms, atts, term, editDistance, 
minPrefix, true);
{code}

In a conversation with Robert Muir, we agreed that this, rather, should keep 
the default that restricts to minPrefix=1 when editDistance=2, but made 
optional to use a minPrefix=0.
  
 Loosen up DirectSpellChecker's minPrefix requirements
 -

 Key: LUCENE-4500
 URL: https://issues.apache.org/jira/browse/LUCENE-4500
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Erik Hatcher
Priority: Minor

 DirectSpellChecker currently mandates a minPrefix of 1 when editDistance=2.  
 This prohibits a query of nusglasses from matching the indexed sunglasses 
 term.
 Granted, there can be performance issues with using a minPrefix of 0, but 
 it's a risk that a user should be allowed to take if needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError

2012-10-22 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481727#comment-13481727
 ] 

Dawid Weiss commented on SOLR-3849:
---

The way to check is to substitute system properties with a custom 
implementation of Properties, override setProperty and dump a stack trace when 
these are actually set to see who the offender is. I'll take a look unless 
somebody beats me to it.


 ScriptEngineTest failure RE system properties and ThreadLeakError
 -

 Key: SOLR-3849
 URL: https://issues.apache.org/jira/browse/SOLR-3849
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 5.0
 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
Reporter: David Smiley
Assignee: Uwe Schindler
 Fix For: 4.0, 5.0

 Attachments: SOLR-3849.patch


 100% reproducible for me:
 solr$ ant test  -Dtestcase=ScriptEngineTest
 {noformat}
 [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4
 [junit4:junit4] Executing 1 suite with 1 JVM.
 [junit4:junit4] 
 [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest
 [junit4:junit4] OK  0.17s | ScriptEngineTest.testPut
 [junit4:junit4] OK  0.02s | ScriptEngineTest.testEvalReader
 [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby
 [junit4:junit4] Assumption #1: got: [null], expected: each not null
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testEvalText
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByExtension
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByName
 [junit4:junit4]   2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING 
 Will linger awaiting termination of 2 leaked thread(s).
 [junit4:junit4]   2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 
 thread leaked from SUITE scope at 
 org.apache.solr.update.processor.ScriptEngineTest: 
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll 
 Starting to interrupt leaked threads:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE 
 There are still zombie threads that couldn't be terminated:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 NOTE: test params are: codec=SimpleText, 
 sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, 
 timezone=America/Edmonton
 [junit4:junit4]   2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
 [junit4:junit4]   2 NOTE: All tests run in this JVM: [ScriptEngineTest]
 [junit4:junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true 
 -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton 
 -Dtests.file.encoding=UTF-8
 [junit4:junit4] ERROR   0.00s | ScriptEngineTest (suite) 
 [junit4:junit4] Throwable #1: java.lang.AssertionError: System 
 properties invariant violated.
 [junit4:junit4] New keys:
 [junit4:junit4]   sun.awt.enableExtraMouseButtons=true
 [junit4:junit4]   sun.font.fontmanager=sun.font.CFontManager
 [junit4:junit4] 
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 

[jira] [Commented] (LUCENE-4500) Loosen up DirectSpellChecker's minPrefix requirements

2012-10-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481728#comment-13481728
 ] 

Robert Muir commented on LUCENE-4500:
-

yeah i think we should add an option to disable this heuristic. 

It was basically a perf/relevance thing (in general edits of 2, esp considering 
a transposition is a single edit, along wotj minPrefix of 0 can yield 
surprisingly irrelevant stuff).

But if someone wants that... let them do it.

 Loosen up DirectSpellChecker's minPrefix requirements
 -

 Key: LUCENE-4500
 URL: https://issues.apache.org/jira/browse/LUCENE-4500
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Erik Hatcher
Priority: Minor

 DirectSpellChecker currently mandates a minPrefix of 1 when editDistance=2.  
 This prohibits a query of nusglasses from matching the indexed sunglasses 
 term.
 Granted, there can be performance issues with using a minPrefix of 0, but 
 it's a risk that a user should be allowed to take if needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError

2012-10-22 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481741#comment-13481741
 ] 

Uwe Schindler commented on SOLR-3849:
-

The strange thing about this issue is still the fact that we have:
{code:xml}
sysproperty key=java.awt.headless value=true/
{code}
Why is AWT booted up at all? This seems to be some OS-X Java bug.

 ScriptEngineTest failure RE system properties and ThreadLeakError
 -

 Key: SOLR-3849
 URL: https://issues.apache.org/jira/browse/SOLR-3849
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 5.0
 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
Reporter: David Smiley
Assignee: Uwe Schindler
 Fix For: 4.0, 5.0

 Attachments: SOLR-3849.patch


 100% reproducible for me:
 solr$ ant test  -Dtestcase=ScriptEngineTest
 {noformat}
 [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4
 [junit4:junit4] Executing 1 suite with 1 JVM.
 [junit4:junit4] 
 [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest
 [junit4:junit4] OK  0.17s | ScriptEngineTest.testPut
 [junit4:junit4] OK  0.02s | ScriptEngineTest.testEvalReader
 [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby
 [junit4:junit4] Assumption #1: got: [null], expected: each not null
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testEvalText
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByExtension
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByName
 [junit4:junit4]   2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING 
 Will linger awaiting termination of 2 leaked thread(s).
 [junit4:junit4]   2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 
 thread leaked from SUITE scope at 
 org.apache.solr.update.processor.ScriptEngineTest: 
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll 
 Starting to interrupt leaked threads:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE 
 There are still zombie threads that couldn't be terminated:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 NOTE: test params are: codec=SimpleText, 
 sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, 
 timezone=America/Edmonton
 [junit4:junit4]   2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
 [junit4:junit4]   2 NOTE: All tests run in this JVM: [ScriptEngineTest]
 [junit4:junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true 
 -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton 
 -Dtests.file.encoding=UTF-8
 [junit4:junit4] ERROR   0.00s | ScriptEngineTest (suite) 
 [junit4:junit4] Throwable #1: java.lang.AssertionError: System 
 properties invariant violated.
 [junit4:junit4] New keys:
 [junit4:junit4]   sun.awt.enableExtraMouseButtons=true
 [junit4:junit4]   sun.font.fontmanager=sun.font.CFontManager
 [junit4:junit4] 
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 

[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError

2012-10-22 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481744#comment-13481744
 ] 

Steven Rowe commented on SOLR-3849:
---

bq. Do anybody of you maybe have a custom scriptng engine in classpath?

My CLASSPATH env. var. is undefined.

bq. Maybe Apple/Macintosh has some CrazyUselessAsAlwaysMäcintrashEngine shipped 
by default.
 
Oracle produces 1.7 JDK for OS X, and the 1.6 JDK comes from Apple.


 ScriptEngineTest failure RE system properties and ThreadLeakError
 -

 Key: SOLR-3849
 URL: https://issues.apache.org/jira/browse/SOLR-3849
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 5.0
 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
Reporter: David Smiley
Assignee: Uwe Schindler
 Fix For: 4.0, 5.0

 Attachments: SOLR-3849.patch


 100% reproducible for me:
 solr$ ant test  -Dtestcase=ScriptEngineTest
 {noformat}
 [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4
 [junit4:junit4] Executing 1 suite with 1 JVM.
 [junit4:junit4] 
 [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest
 [junit4:junit4] OK  0.17s | ScriptEngineTest.testPut
 [junit4:junit4] OK  0.02s | ScriptEngineTest.testEvalReader
 [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby
 [junit4:junit4] Assumption #1: got: [null], expected: each not null
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testEvalText
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByExtension
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByName
 [junit4:junit4]   2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING 
 Will linger awaiting termination of 2 leaked thread(s).
 [junit4:junit4]   2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 
 thread leaked from SUITE scope at 
 org.apache.solr.update.processor.ScriptEngineTest: 
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll 
 Starting to interrupt leaked threads:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE 
 There are still zombie threads that couldn't be terminated:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 NOTE: test params are: codec=SimpleText, 
 sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, 
 timezone=America/Edmonton
 [junit4:junit4]   2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
 [junit4:junit4]   2 NOTE: All tests run in this JVM: [ScriptEngineTest]
 [junit4:junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true 
 -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton 
 -Dtests.file.encoding=UTF-8
 [junit4:junit4] ERROR   0.00s | ScriptEngineTest (suite) 
 [junit4:junit4] Throwable #1: java.lang.AssertionError: System 
 properties invariant violated.
 [junit4:junit4] New keys:
 [junit4:junit4]   sun.awt.enableExtraMouseButtons=true
 [junit4:junit4]   sun.font.fontmanager=sun.font.CFontManager
 [junit4:junit4] 
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 

[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError

2012-10-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481745#comment-13481745
 ] 

Robert Muir commented on SOLR-3849:
---

when I run 'ant check-svn-working-copy' (even on 1.6) on my apple it boots up 
AWT as well.

I thought we were passing headless to all this stuff now?

 ScriptEngineTest failure RE system properties and ThreadLeakError
 -

 Key: SOLR-3849
 URL: https://issues.apache.org/jira/browse/SOLR-3849
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 5.0
 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
Reporter: David Smiley
Assignee: Uwe Schindler
 Fix For: 4.0, 5.0

 Attachments: SOLR-3849.patch


 100% reproducible for me:
 solr$ ant test  -Dtestcase=ScriptEngineTest
 {noformat}
 [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4
 [junit4:junit4] Executing 1 suite with 1 JVM.
 [junit4:junit4] 
 [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest
 [junit4:junit4] OK  0.17s | ScriptEngineTest.testPut
 [junit4:junit4] OK  0.02s | ScriptEngineTest.testEvalReader
 [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby
 [junit4:junit4] Assumption #1: got: [null], expected: each not null
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testEvalText
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByExtension
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByName
 [junit4:junit4]   2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING 
 Will linger awaiting termination of 2 leaked thread(s).
 [junit4:junit4]   2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 
 thread leaked from SUITE scope at 
 org.apache.solr.update.processor.ScriptEngineTest: 
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll 
 Starting to interrupt leaked threads:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE 
 There are still zombie threads that couldn't be terminated:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 NOTE: test params are: codec=SimpleText, 
 sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, 
 timezone=America/Edmonton
 [junit4:junit4]   2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
 [junit4:junit4]   2 NOTE: All tests run in this JVM: [ScriptEngineTest]
 [junit4:junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true 
 -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton 
 -Dtests.file.encoding=UTF-8
 [junit4:junit4] ERROR   0.00s | ScriptEngineTest (suite) 
 [junit4:junit4] Throwable #1: java.lang.AssertionError: System 
 properties invariant violated.
 [junit4:junit4] New keys:
 [junit4:junit4]   sun.awt.enableExtraMouseButtons=true
 [junit4:junit4]   sun.font.fontmanager=sun.font.CFontManager
 [junit4:junit4] 
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 [junit4:junit4]  at 

[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2012-10-22 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481751#comment-13481751
 ] 

Shawn Heisey commented on SOLR-1972:


Awesome, Alan!  What options might we have to prevent long-running handlers 
from accumulating huge metrics histories and chewing up tons of RAM?  Is there 
a get75thpercentile method?  With the old patch, I do 75, 95, and 99.  I would 
also like to add 99.9, but the old patch uses ints so that wasn't possible.

When I have a moment, I will attempt to look at the javadocs for the package 
and answer my own questions.  Unless you get to it first, I will also attempt 
to mod the patch to expose any memory-limiting options.


 Need additional query stats in admin interface - median, 95th and 99th 
 percentile
 -

 Key: SOLR-1972
 URL: https://issues.apache.org/jira/browse/SOLR-1972
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Shawn Heisey
Priority: Minor
 Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
 elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
 SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
 SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972.patch, 
 SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972-url_pattern.patch


 I would like to see more detailed query statistics from the admin GUI.  This 
 is what you can get now:
 requests : 809
 errors : 0
 timeouts : 0
 totalTime : 70053
 avgTimePerRequest : 86.59209
 avgRequestsPerSecond : 0.8148785 
 I'd like to see more data on the time per request - median, 95th percentile, 
 99th percentile, and any other statistical function that makes sense to 
 include.  In my environment, the first bunch of queries after startup tend to 
 take several seconds each.  I find that the average value tends to be useless 
 until it has several thousand queries under its belt and the caches are 
 thoroughly warmed.  The statistical functions I have mentioned would quickly 
 eliminate the influence of those initial slow queries.
 The system will have to store individual data about each query.  I don't know 
 if this is something Solr does already.  It would be nice to have a 
 configurable count of how many of the most recent data points are kept, to 
 control the amount of memory the feature uses.  The default value could be 
 something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError

2012-10-22 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481762#comment-13481762
 ] 

Uwe Schindler commented on SOLR-3849:
-

Digging around the source code of OpenJDK i found the following horrible class:
http://cr.openjdk.java.net/~michaelm/7113349/7u4/1/jdk/new/raw_files/new/src/macosx/classes/apple/applescript/AppleScriptEngineFactory.java

In fact this one is the factory class for (as I said before) Apple's custom 
ÄppleScript engine. If you look at the static ctor, you know what's happening: 
As soon as the scripting engine manager is loading the factory class via SPI 
from rt.jar, this code is executed and boots up AWT. The question is, why 
java.awt.headless=true does not prevent this, but I assume the if statement 
for that is missing.

 ScriptEngineTest failure RE system properties and ThreadLeakError
 -

 Key: SOLR-3849
 URL: https://issues.apache.org/jira/browse/SOLR-3849
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 5.0
 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
Reporter: David Smiley
Assignee: Uwe Schindler
 Fix For: 4.0, 5.0

 Attachments: SOLR-3849.patch


 100% reproducible for me:
 solr$ ant test  -Dtestcase=ScriptEngineTest
 {noformat}
 [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4
 [junit4:junit4] Executing 1 suite with 1 JVM.
 [junit4:junit4] 
 [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest
 [junit4:junit4] OK  0.17s | ScriptEngineTest.testPut
 [junit4:junit4] OK  0.02s | ScriptEngineTest.testEvalReader
 [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby
 [junit4:junit4] Assumption #1: got: [null], expected: each not null
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testEvalText
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByExtension
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByName
 [junit4:junit4]   2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING 
 Will linger awaiting termination of 2 leaked thread(s).
 [junit4:junit4]   2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 
 thread leaked from SUITE scope at 
 org.apache.solr.update.processor.ScriptEngineTest: 
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll 
 Starting to interrupt leaked threads:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE 
 There are still zombie threads that couldn't be terminated:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 NOTE: test params are: codec=SimpleText, 
 sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, 
 timezone=America/Edmonton
 [junit4:junit4]   2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
 [junit4:junit4]   2 NOTE: All tests run in this JVM: [ScriptEngineTest]
 [junit4:junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true 
 -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton 
 -Dtests.file.encoding=UTF-8
 [junit4:junit4] ERROR   0.00s | ScriptEngineTest (suite) 
 [junit4:junit4] Throwable #1: java.lang.AssertionError: System 
 properties invariant violated.
 [junit4:junit4] New keys:
 [junit4:junit4]   sun.awt.enableExtraMouseButtons=true
 [junit4:junit4]   sun.font.fontmanager=sun.font.CFontManager
 [junit4:junit4] 
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 [junit4:junit4]  at 
 

[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError

2012-10-22 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481766#comment-13481766
 ] 

Uwe Schindler commented on SOLR-3849:
-

We should file a bug at Oracle telling them that this scripting engine does not 
respect headless setting.

 ScriptEngineTest failure RE system properties and ThreadLeakError
 -

 Key: SOLR-3849
 URL: https://issues.apache.org/jira/browse/SOLR-3849
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 5.0
 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
Reporter: David Smiley
Assignee: Uwe Schindler
 Fix For: 4.0, 5.0

 Attachments: SOLR-3849.patch


 100% reproducible for me:
 solr$ ant test  -Dtestcase=ScriptEngineTest
 {noformat}
 [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4
 [junit4:junit4] Executing 1 suite with 1 JVM.
 [junit4:junit4] 
 [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest
 [junit4:junit4] OK  0.17s | ScriptEngineTest.testPut
 [junit4:junit4] OK  0.02s | ScriptEngineTest.testEvalReader
 [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby
 [junit4:junit4] Assumption #1: got: [null], expected: each not null
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testEvalText
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByExtension
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByName
 [junit4:junit4]   2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING 
 Will linger awaiting termination of 2 leaked thread(s).
 [junit4:junit4]   2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 
 thread leaked from SUITE scope at 
 org.apache.solr.update.processor.ScriptEngineTest: 
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll 
 Starting to interrupt leaked threads:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE 
 There are still zombie threads that couldn't be terminated:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 NOTE: test params are: codec=SimpleText, 
 sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, 
 timezone=America/Edmonton
 [junit4:junit4]   2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
 [junit4:junit4]   2 NOTE: All tests run in this JVM: [ScriptEngineTest]
 [junit4:junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true 
 -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton 
 -Dtests.file.encoding=UTF-8
 [junit4:junit4] ERROR   0.00s | ScriptEngineTest (suite) 
 [junit4:junit4] Throwable #1: java.lang.AssertionError: System 
 properties invariant violated.
 [junit4:junit4] New keys:
 [junit4:junit4]   sun.awt.enableExtraMouseButtons=true
 [junit4:junit4]   sun.font.fontmanager=sun.font.CFontManager
 [junit4:junit4] 
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 [junit4:junit4]  at java.lang.Thread.run(Thread.java:722)
 [junit4:junit4] 

[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2012-10-22 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481767#comment-13481767
 ] 

Alan Woodward commented on SOLR-1972:
-

Hi Shawn,

Metrics uses reservoir sampling to maintain its measurements, so the history is 
actually always a fixed size.  This is configurable, but defaults to 1024 
entries.  There's more information at 
http://metrics.codahale.com/manual/core/#histograms and 
http://www.johndcook.com/standard_deviation.html.

There are get75thpercentile and get999thpercentile methods out of the box, and 
you can also ask for values at arbitrary percentiles using getValue().

 Need additional query stats in admin interface - median, 95th and 99th 
 percentile
 -

 Key: SOLR-1972
 URL: https://issues.apache.org/jira/browse/SOLR-1972
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Shawn Heisey
Priority: Minor
 Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
 elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
 SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
 SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972.patch, 
 SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972-url_pattern.patch


 I would like to see more detailed query statistics from the admin GUI.  This 
 is what you can get now:
 requests : 809
 errors : 0
 timeouts : 0
 totalTime : 70053
 avgTimePerRequest : 86.59209
 avgRequestsPerSecond : 0.8148785 
 I'd like to see more data on the time per request - median, 95th percentile, 
 99th percentile, and any other statistical function that makes sense to 
 include.  In my environment, the first bunch of queries after startup tend to 
 take several seconds each.  I find that the average value tends to be useless 
 until it has several thousand queries under its belt and the caches are 
 thoroughly warmed.  The statistical functions I have mentioned would quickly 
 eliminate the influence of those initial slow queries.
 The system will have to store individual data about each query.  I don't know 
 if this is something Solr does already.  It would be nice to have a 
 configurable count of how many of the most recent data points are kept, to 
 control the amount of memory the feature uses.  The default value could be 
 something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec

2012-10-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481768#comment-13481768
 ] 

Michael McCandless commented on LUCENE-4498:


Looks good:

{noformat}
TaskQPS base  StdDevQPS comp  StdDev
Pct diff
 Respell   86.70  (3.0%)   84.04  (2.6%)   
-3.1% (  -8% -2%)
   OrHighMed   41.52  (5.8%)   40.44  (6.1%)   
-2.6% ( -13% -9%)
   OrHighLow   25.43  (6.0%)   24.77  (6.4%)   
-2.6% ( -14% -   10%)
  OrHighHigh9.38  (5.9%)9.15  (6.4%)   
-2.5% ( -14% -   10%)
Wildcard   93.94  (4.1%)   92.36  (2.0%)   
-1.7% (  -7% -4%)
 MedTerm  211.10 (12.3%)  208.78 (13.4%)   
-1.1% ( -23% -   27%)
  IntNRQ   10.74 (11.3%)   10.62  (7.8%)   
-1.1% ( -18% -   20%)
HighTerm   25.59 (14.0%)   25.35 (15.0%)   
-1.0% ( -26% -   32%)
 MedSpanNear   13.77  (2.3%)   13.68  (1.6%)   
-0.7% (  -4% -3%)
HighSloppyPhrase4.09  (5.4%)4.07  (5.2%)   
-0.5% ( -10% -   10%)
HighSpanNear6.84  (2.9%)6.81  (2.1%)   
-0.4% (  -5% -4%)
 Prefix3   17.81  (5.7%)   17.74  (1.5%)   
-0.4% (  -7% -7%)
  Fuzzy1   77.54  (2.5%)   77.25  (2.7%)   
-0.4% (  -5% -4%)
  AndHighLow  719.17  (2.7%)  716.49  (2.3%)   
-0.4% (  -5% -4%)
  Fuzzy2   68.94  (2.4%)   68.69  (2.8%)   
-0.4% (  -5% -5%)
 LowSpanNear   12.89  (1.8%)   12.85  (1.3%)   
-0.3% (  -3% -2%)
 MedSloppyPhrase   29.92  (3.4%)   29.85  (3.4%)   
-0.2% (  -6% -6%)
 LowTerm  500.58  (5.9%)  500.52  (7.0%)   
-0.0% ( -12% -   13%)
 LowSloppyPhrase9.57  (4.4%)9.60  (4.3%)
0.4% (  -7% -9%)
   LowPhrase9.64  (2.8%)9.70  (3.0%)
0.7% (  -4% -6%)
  AndHighMed   86.68  (1.2%)   87.26  (1.2%)
0.7% (  -1% -3%)
   MedPhrase7.07  (4.3%)7.15  (4.6%)
1.1% (  -7% -   10%)
  HighPhrase4.79  (4.8%)4.84  (5.6%)
1.1% (  -8% -   12%)
 AndHighHigh   25.81  (1.7%)   26.20  (1.2%)
1.5% (  -1% -4%)
PKLookup  193.31  (2.1%)  204.74  (1.6%)
5.9% (   2% -9%)
{noformat}


 pulse docfreq=1 DOCS_ONLY for 4.1 codec
 ---

 Key: LUCENE-4498
 URL: https://issues.apache.org/jira/browse/LUCENE-4498
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Robert Muir
 Attachments: LUCENE-4498_lazy.patch, LUCENE-4498.patch, 
 LUCENE-4498.patch


 We have pulsing codec, but currently this has some downsides:
 * its very general, wrapping an arbitrary postingsformat and pulsing 
 everything in the postings for an arbitrary docfreq/totalTermFreq cutoff
 * reuse is hairy: because it specializes its enums based on these cutoffs, 
 when walking thru terms e.g. merging there is a lot of sophisticated stuff to 
 avoid the worst cases where we clone indexinputs for tons of terms.
 On the other hand the way the 4.1 codec encodes primary key fields is 
 pretty silly, we write the docStartFP vlong in the term dictionary metadata, 
 which tells us where to seek in the .doc to read our one lonely vint.
 I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just 
 write the lone doc delta where we would write docStartFP. 
 We can avoid the hairy reuse problem too, by just supporting this in 
 refillDocs() in BlockDocsEnum instead of specializing.
 This would remove the additional seek for primary key fields without really 
 any of the downsides of pulsing today.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec

2012-10-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481769#comment-13481769
 ] 

Robert Muir commented on LUCENE-4498:
-

This code can be simplified and generalized a bit. basically it just needs to 
be docFreq == 1. in this case totalTermFreq is redundant for freq,
so we can e.g. pulse a term that appears 5 times but only in one doc.

I'll update the patch again.

 pulse docfreq=1 DOCS_ONLY for 4.1 codec
 ---

 Key: LUCENE-4498
 URL: https://issues.apache.org/jira/browse/LUCENE-4498
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Robert Muir
 Attachments: LUCENE-4498_lazy.patch, LUCENE-4498.patch, 
 LUCENE-4498.patch


 We have pulsing codec, but currently this has some downsides:
 * its very general, wrapping an arbitrary postingsformat and pulsing 
 everything in the postings for an arbitrary docfreq/totalTermFreq cutoff
 * reuse is hairy: because it specializes its enums based on these cutoffs, 
 when walking thru terms e.g. merging there is a lot of sophisticated stuff to 
 avoid the worst cases where we clone indexinputs for tons of terms.
 On the other hand the way the 4.1 codec encodes primary key fields is 
 pretty silly, we write the docStartFP vlong in the term dictionary metadata, 
 which tells us where to seek in the .doc to read our one lonely vint.
 I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just 
 write the lone doc delta where we would write docStartFP. 
 We can avoid the hairy reuse problem too, by just supporting this in 
 refillDocs() in BlockDocsEnum instead of specializing.
 This would remove the additional seek for primary key fields without really 
 any of the downsides of pulsing today.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError

2012-10-22 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481774#comment-13481774
 ] 

Dawid Weiss commented on SOLR-3849:
---

Thanks for digging, Uwe. So these property invariants are actually useful :) 
Since this breaks the tests we should add these two to the ignore set (at least 
until Oracle fixes this?). LuceneTestCase:

{code}
  /**
   * These property keys will be ignored in verification of altered properties.
   * @see SystemPropertiesInvariantRule
   * @see #ruleChain
   * @see #classRules
   */
  private static final String [] IGNORED_INVARIANT_PROPERTIES = {
user.timezone, java.rmi.server.randomIDs
  };
{code}

 ScriptEngineTest failure RE system properties and ThreadLeakError
 -

 Key: SOLR-3849
 URL: https://issues.apache.org/jira/browse/SOLR-3849
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 5.0
 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
Reporter: David Smiley
Assignee: Uwe Schindler
 Fix For: 4.0, 5.0

 Attachments: SOLR-3849.patch


 100% reproducible for me:
 solr$ ant test  -Dtestcase=ScriptEngineTest
 {noformat}
 [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4
 [junit4:junit4] Executing 1 suite with 1 JVM.
 [junit4:junit4] 
 [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest
 [junit4:junit4] OK  0.17s | ScriptEngineTest.testPut
 [junit4:junit4] OK  0.02s | ScriptEngineTest.testEvalReader
 [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby
 [junit4:junit4] Assumption #1: got: [null], expected: each not null
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testEvalText
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByExtension
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByName
 [junit4:junit4]   2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING 
 Will linger awaiting termination of 2 leaked thread(s).
 [junit4:junit4]   2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 
 thread leaked from SUITE scope at 
 org.apache.solr.update.processor.ScriptEngineTest: 
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll 
 Starting to interrupt leaked threads:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE 
 There are still zombie threads that couldn't be terminated:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 NOTE: test params are: codec=SimpleText, 
 sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, 
 timezone=America/Edmonton
 [junit4:junit4]   2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
 [junit4:junit4]   2 NOTE: All tests run in this JVM: [ScriptEngineTest]
 [junit4:junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true 
 -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton 
 -Dtests.file.encoding=UTF-8
 [junit4:junit4] ERROR   0.00s | ScriptEngineTest (suite) 
 [junit4:junit4] Throwable #1: java.lang.AssertionError: System 
 properties invariant violated.
 [junit4:junit4] New keys:
 [junit4:junit4]   sun.awt.enableExtraMouseButtons=true
 [junit4:junit4]   sun.font.fontmanager=sun.font.CFontManager
 [junit4:junit4] 
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 [junit4:junit4]  at 
 

[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError

2012-10-22 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481783#comment-13481783
 ] 

Uwe Schindler commented on SOLR-3849:
-

Can we ignore those *only* for this test?

 ScriptEngineTest failure RE system properties and ThreadLeakError
 -

 Key: SOLR-3849
 URL: https://issues.apache.org/jira/browse/SOLR-3849
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 5.0
 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
Reporter: David Smiley
Assignee: Uwe Schindler
 Fix For: 4.0, 5.0

 Attachments: SOLR-3849.patch


 100% reproducible for me:
 solr$ ant test  -Dtestcase=ScriptEngineTest
 {noformat}
 [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4
 [junit4:junit4] Executing 1 suite with 1 JVM.
 [junit4:junit4] 
 [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest
 [junit4:junit4] OK  0.17s | ScriptEngineTest.testPut
 [junit4:junit4] OK  0.02s | ScriptEngineTest.testEvalReader
 [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby
 [junit4:junit4] Assumption #1: got: [null], expected: each not null
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testEvalText
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByExtension
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByName
 [junit4:junit4]   2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING 
 Will linger awaiting termination of 2 leaked thread(s).
 [junit4:junit4]   2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 
 thread leaked from SUITE scope at 
 org.apache.solr.update.processor.ScriptEngineTest: 
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll 
 Starting to interrupt leaked threads:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE 
 There are still zombie threads that couldn't be terminated:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 NOTE: test params are: codec=SimpleText, 
 sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, 
 timezone=America/Edmonton
 [junit4:junit4]   2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
 [junit4:junit4]   2 NOTE: All tests run in this JVM: [ScriptEngineTest]
 [junit4:junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true 
 -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton 
 -Dtests.file.encoding=UTF-8
 [junit4:junit4] ERROR   0.00s | ScriptEngineTest (suite) 
 [junit4:junit4] Throwable #1: java.lang.AssertionError: System 
 properties invariant violated.
 [junit4:junit4] New keys:
 [junit4:junit4]   sun.awt.enableExtraMouseButtons=true
 [junit4:junit4]   sun.font.fontmanager=sun.font.CFontManager
 [junit4:junit4] 
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 [junit4:junit4]  at java.lang.Thread.run(Thread.java:722)
 [junit4:junit4] Throwable #2: 
 

[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError

2012-10-22 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481786#comment-13481786
 ] 

Dawid Weiss commented on SOLR-3849:
---

I think you'd have to redefine the entire rule chain by shadowing the field. 
It's JUnit, not me -- sorry.

 ScriptEngineTest failure RE system properties and ThreadLeakError
 -

 Key: SOLR-3849
 URL: https://issues.apache.org/jira/browse/SOLR-3849
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 5.0
 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
Reporter: David Smiley
Assignee: Uwe Schindler
 Fix For: 4.0, 5.0

 Attachments: SOLR-3849.patch


 100% reproducible for me:
 solr$ ant test  -Dtestcase=ScriptEngineTest
 {noformat}
 [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4
 [junit4:junit4] Executing 1 suite with 1 JVM.
 [junit4:junit4] 
 [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest
 [junit4:junit4] OK  0.17s | ScriptEngineTest.testPut
 [junit4:junit4] OK  0.02s | ScriptEngineTest.testEvalReader
 [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby
 [junit4:junit4] Assumption #1: got: [null], expected: each not null
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testEvalText
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByExtension
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByName
 [junit4:junit4]   2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING 
 Will linger awaiting termination of 2 leaked thread(s).
 [junit4:junit4]   2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 
 thread leaked from SUITE scope at 
 org.apache.solr.update.processor.ScriptEngineTest: 
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll 
 Starting to interrupt leaked threads:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE 
 There are still zombie threads that couldn't be terminated:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 NOTE: test params are: codec=SimpleText, 
 sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, 
 timezone=America/Edmonton
 [junit4:junit4]   2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
 [junit4:junit4]   2 NOTE: All tests run in this JVM: [ScriptEngineTest]
 [junit4:junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true 
 -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton 
 -Dtests.file.encoding=UTF-8
 [junit4:junit4] ERROR   0.00s | ScriptEngineTest (suite) 
 [junit4:junit4] Throwable #1: java.lang.AssertionError: System 
 properties invariant violated.
 [junit4:junit4] New keys:
 [junit4:junit4]   sun.awt.enableExtraMouseButtons=true
 [junit4:junit4]   sun.font.fontmanager=sun.font.CFontManager
 [junit4:junit4] 
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 [junit4:junit4]  at java.lang.Thread.run(Thread.java:722)
 [junit4:junit4] 

[jira] [Updated] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec

2012-10-22 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4498:


Attachment: LUCENE-4498.patch

here's the docFreq=1 patch. I like this a lot better, i dont think it really 
buys us much but just makes the code simpler and easier to understand.

 pulse docfreq=1 DOCS_ONLY for 4.1 codec
 ---

 Key: LUCENE-4498
 URL: https://issues.apache.org/jira/browse/LUCENE-4498
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Robert Muir
 Attachments: LUCENE-4498_lazy.patch, LUCENE-4498.patch, 
 LUCENE-4498.patch, LUCENE-4498.patch


 We have pulsing codec, but currently this has some downsides:
 * its very general, wrapping an arbitrary postingsformat and pulsing 
 everything in the postings for an arbitrary docfreq/totalTermFreq cutoff
 * reuse is hairy: because it specializes its enums based on these cutoffs, 
 when walking thru terms e.g. merging there is a lot of sophisticated stuff to 
 avoid the worst cases where we clone indexinputs for tons of terms.
 On the other hand the way the 4.1 codec encodes primary key fields is 
 pretty silly, we write the docStartFP vlong in the term dictionary metadata, 
 which tells us where to seek in the .doc to read our one lonely vint.
 I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just 
 write the lone doc delta where we would write docStartFP. 
 We can avoid the hairy reuse problem too, by just supporting this in 
 refillDocs() in BlockDocsEnum instead of specializing.
 This would remove the additional seek for primary key fields without really 
 any of the downsides of pulsing today.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2012-10-22 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481797#comment-13481797
 ] 

Shawn Heisey commented on SOLR-1972:


I have answers to some of my questions.  There is a 75th percentile.  I added 
the 75th and 999th to what you had, and it seems to display the stats page a 
lot faster than my patch did.  We'll see what happens when it gets a few 
thousand queries under its belt, though.  I was running the old patch with 
16384 samples, and I put the stats on three handlers, so it was having to copy 
arrays of 16384 longs a total of six times every time I refreshed the stats 
page.  I may also add the 98th percentile.  It may be a good idea to make each 
percentile point configurable in solrconfig.xml.  So far I have not yet figured 
out whether it is possible to limit the number of samples stored, or anything 
else which can limit the amount of memory required.

The names for the average req/s over the last 5 and 15 minutes are REALLY long. 
 Unless you have a high res display (1920 pixels wide) and maximize the window, 
the names overlap the values.  If I think of a reasonable way to shorten those, 
I will.  I ran into it myself when making my branch_4x patch.


 Need additional query stats in admin interface - median, 95th and 99th 
 percentile
 -

 Key: SOLR-1972
 URL: https://issues.apache.org/jira/browse/SOLR-1972
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Shawn Heisey
Priority: Minor
 Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
 elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
 SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
 SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972.patch, 
 SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972-url_pattern.patch


 I would like to see more detailed query statistics from the admin GUI.  This 
 is what you can get now:
 requests : 809
 errors : 0
 timeouts : 0
 totalTime : 70053
 avgTimePerRequest : 86.59209
 avgRequestsPerSecond : 0.8148785 
 I'd like to see more data on the time per request - median, 95th percentile, 
 99th percentile, and any other statistical function that makes sense to 
 include.  In my environment, the first bunch of queries after startup tend to 
 take several seconds each.  I find that the average value tends to be useless 
 until it has several thousand queries under its belt and the caches are 
 thoroughly warmed.  The statistical functions I have mentioned would quickly 
 eliminate the influence of those initial slow queries.
 The system will have to store individual data about each query.  I don't know 
 if this is something Solr does already.  It would be nice to have a 
 configurable count of how many of the most recent data points are kept, to 
 control the amount of memory the feature uses.  The default value could be 
 something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2012-10-22 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481809#comment-13481809
 ] 

Shawn Heisey commented on SOLR-1972:


I didn't see your reply about the reservoir size until after I'd already 
submitted mine.  If I want to increase/decrease that size, how do I do that?  
So far poking around the javadocs and using google hasn't turned anything up.


 Need additional query stats in admin interface - median, 95th and 99th 
 percentile
 -

 Key: SOLR-1972
 URL: https://issues.apache.org/jira/browse/SOLR-1972
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Shawn Heisey
Priority: Minor
 Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
 elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
 SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
 SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972.patch, 
 SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972-url_pattern.patch


 I would like to see more detailed query statistics from the admin GUI.  This 
 is what you can get now:
 requests : 809
 errors : 0
 timeouts : 0
 totalTime : 70053
 avgTimePerRequest : 86.59209
 avgRequestsPerSecond : 0.8148785 
 I'd like to see more data on the time per request - median, 95th percentile, 
 99th percentile, and any other statistical function that makes sense to 
 include.  In my environment, the first bunch of queries after startup tend to 
 take several seconds each.  I find that the average value tends to be useless 
 until it has several thousand queries under its belt and the caches are 
 thoroughly warmed.  The statistical functions I have mentioned would quickly 
 eliminate the influence of those initial slow queries.
 The system will have to store individual data about each query.  I don't know 
 if this is something Solr does already.  It would be nice to have a 
 configurable count of how many of the most recent data points are kept, to 
 control the amount of memory the feature uses.  The default value could be 
 something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec

2012-10-22 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4498:


Attachment: LUCENE-4498.patch

patch with file format docs and comment fixes.

I think this is ready to go.

 pulse docfreq=1 DOCS_ONLY for 4.1 codec
 ---

 Key: LUCENE-4498
 URL: https://issues.apache.org/jira/browse/LUCENE-4498
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Robert Muir
 Attachments: LUCENE-4498_lazy.patch, LUCENE-4498.patch, 
 LUCENE-4498.patch, LUCENE-4498.patch, LUCENE-4498.patch


 We have pulsing codec, but currently this has some downsides:
 * its very general, wrapping an arbitrary postingsformat and pulsing 
 everything in the postings for an arbitrary docfreq/totalTermFreq cutoff
 * reuse is hairy: because it specializes its enums based on these cutoffs, 
 when walking thru terms e.g. merging there is a lot of sophisticated stuff to 
 avoid the worst cases where we clone indexinputs for tons of terms.
 On the other hand the way the 4.1 codec encodes primary key fields is 
 pretty silly, we write the docStartFP vlong in the term dictionary metadata, 
 which tells us where to seek in the .doc to read our one lonely vint.
 I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just 
 write the lone doc delta where we would write docStartFP. 
 We can avoid the hairy reuse problem too, by just supporting this in 
 refillDocs() in BlockDocsEnum instead of specializing.
 This would remove the additional seek for primary key fields without really 
 any of the downsides of pulsing today.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec

2012-10-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481824#comment-13481824
 ] 

Michael McCandless commented on LUCENE-4498:


+1

Very nice to fold pulsing into the default PF!

 pulse docfreq=1 DOCS_ONLY for 4.1 codec
 ---

 Key: LUCENE-4498
 URL: https://issues.apache.org/jira/browse/LUCENE-4498
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Robert Muir
 Attachments: LUCENE-4498_lazy.patch, LUCENE-4498.patch, 
 LUCENE-4498.patch, LUCENE-4498.patch, LUCENE-4498.patch


 We have pulsing codec, but currently this has some downsides:
 * its very general, wrapping an arbitrary postingsformat and pulsing 
 everything in the postings for an arbitrary docfreq/totalTermFreq cutoff
 * reuse is hairy: because it specializes its enums based on these cutoffs, 
 when walking thru terms e.g. merging there is a lot of sophisticated stuff to 
 avoid the worst cases where we clone indexinputs for tons of terms.
 On the other hand the way the 4.1 codec encodes primary key fields is 
 pretty silly, we write the docStartFP vlong in the term dictionary metadata, 
 which tells us where to seek in the .doc to read our one lonely vint.
 I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just 
 write the lone doc delta where we would write docStartFP. 
 We can avoid the hairy reuse problem too, by just supporting this in 
 refillDocs() in BlockDocsEnum instead of specializing.
 This would remove the additional seek for primary key fields without really 
 any of the downsides of pulsing today.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >