[jira] [Updated] (SOLR-3973) Cross facet
[ https://issues.apache.org/jira/browse/SOLR-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhengBowen updated SOLR-3973: - Attachment: crossfacet.patch the patch of cross facet. Cross facet --- Key: SOLR-3973 URL: https://issues.apache.org/jira/browse/SOLR-3973 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 3.5 Reporter: ZhengBowen Labels: cross, facet, solr Fix For: 3.5 Attachments: crossfacet.patch We often come across the scene of the multi-faceted cross, For example, the SQL statement, select count( * ) from table1 group by A,B. Now we slightly modified for FacetComponent, this component to be able to support the multi-faceted cross. Request parameters are as follows: start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=, The effect of the new features are as follows: response lst name=responseHeader int name=status0/int int name=QTime84/int lst name=params str name=facet.crosstrue/str str name=facettrue/str str name=shards 10.253.93.71:62511/solr,10.253.93.71:62512/solr,10.253.93.71:62513/solr,10.253.93.71:62514/solr, /str str name=facet.cross.sep,/str str name=start0/str str name=q*:*/str str name=facet.limit10/str arr name=facet.field struser_city/str struser_province/str /arr str name=rows0/str /lst /lst result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city,user_province int name=Beijing,Beijing16852/int int name=Shanghai,Shanghai16787/int int name=Guangzhou,Gunagdong12950/int int name=Shenzheng,Guangdong11667/int int name=Hangzhou,Zhejiang9997/int int name=Chongqing,Chongqing7624/int int name=Chengdu,Sichuan7082/int int name=Wuhan,Hubei6894/int int name=Suzhou,Jiangsu6528/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst /response -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3963) SOLR: map() does not allow passing recip() sub-functions
[ https://issues.apache.org/jira/browse/SOLR-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481191#comment-13481191 ] Bill Bell commented on SOLR-3963: - Yep. Enhancement request. Bill Bell Sent from mobile SOLR: map() does not allow passing recip() sub-functions Key: SOLR-3963 URL: https://issues.apache.org/jira/browse/SOLR-3963 Project: Solr Issue Type: Bug Affects Versions: 4.0 Reporter: Bill Bell I want to do: boost=map(achievement_count,1,1000,recip(achievement_count,-.5,10,25),1) I want to return recip(achievement_count,-.5,10,25) if achievement_count is between 1 and 1,000. FOr any other values I want to return 1. I cannot get it to work. I get the error below. Interesting this does work: boost=recip(map(achievement_count,0,0,-200),-.5,10,25) It almost appears that map() cannot take a function. Specified argument was out of the range of valid values. Parameter name: value Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code. Exception Details: System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. Parameter name: value Source Error: An unhandled exception was generated during the execution of the current web request. Information regarding the origin and location of the exception can be identified using the exception stack trace below. Stack Trace: [ArgumentOutOfRangeException: Specified argument was out of the range of valid values. Parameter name: value] System.Web.HttpResponse.set_StatusDescription(String value) +5200522 FacilityService.Controllers.FacilityController.ActionCompleted(String actionName, IFacilityResults results) +265 FacilityService.Controllers.FacilityController.SearchByPointCompleted(IFacilityResults results) +25 lambda_method(Closure , ControllerBase , Object[] ) +114 System.Web.Mvc.Async.c__DisplayClass7.BeginExecuteb__5(IAsyncResult asyncResult) +283 System.Web.Mvc.Async.c__DisplayClass41.BeginInvokeAsynchronousActionMethodb__40(IAsyncResult asyncResult) +22 System.Web.Mvc.Async.c__DisplayClass3b.BeginInvokeActionMethodWithFiltersb__35() +120 System.Web.Mvc.Async.c__DisplayClass51.InvokeActionMethodFilterAsynchronouslyb__4b() +452 System.Web.Mvc.Async.c__DisplayClass39.BeginInvokeActionMethodWithFiltersb__38(IAsyncResult asyncResult) +15 System.Web.Mvc.Async.c__DisplayClass2c.BeginInvokeActionb__22() +33 System.Web.Mvc.Async.c__DisplayClass27.BeginInvokeActionb__24(IAsyncResult asyncResult) +240 System.Web.Mvc.c__DisplayClass19.BeginExecuteCoreb__14(IAsyncResult asyncResult) +28 System.Web.Mvc.Async.c__DisplayClass4.MakeVoidDelegateb__3(IAsyncResult ar) +15 System.Web.Mvc.AsyncController.EndExecuteCore(IAsyncResult asyncResult) +63 System.Web.Mvc.Async.c__DisplayClass4.MakeVoidDelegateb__3(IAsyncResult ar) +15 System.Web.Mvc.c__DisplayClassb.BeginProcessRequestb__4(IAsyncResult asyncResult) +42 System.Web.Mvc.Async.c__DisplayClass4.MakeVoidDelegateb__3(IAsyncResult ar) +15 System.Web.CallHandlerExecutionStep.OnAsyncHandlerCompletion(IAsyncResult ar) +282 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3973) Cross facet
[ https://issues.apache.org/jira/browse/SOLR-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhengBowen updated SOLR-3973: - Description: We often come across the scene of the multi-faceted cross, For example, the SQL statement, select count( * ) from table1 group by A,B. Now we slightly modified for FacetComponent, this component to be able to support the multi-faceted cross. Request parameters are as follows: start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=, The original effect is as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city int name=Beijing16852/int int name=ShangHai16787/int int name=Gunagzhou12950/int int name=Shenzhen11667/int int name=Hangzhou9997/int int name=Chongqing7624/int int name=Chengdu7082/int int name=Wuhan6894/int int name=Suzhou6528/int int name=Tianjin5822/int /lst lst name=user_province int name=Gunagdong48621/int int name=Zhengjiang34634/int int name=Jiangsu28748/int int name=Shandong20389/int int name=Fujian18508/int int name=Beijing16852/int int name=Shanghai16787/int int name=Hubei15227/int int name=Sichuan15112/int int name=Hebei13793/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst The effect of the new features are as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city,user_province int name=Beijing,Beijing16852/int int name=Shanghai,Shanghai16787/int int name=Guangzhou,Gunagdong12950/int int name=Shenzheng,Guangdong11667/int int name=Hangzhou,Zhejiang9997/int int name=Chongqing,Chongqing7624/int int name=Chengdu,Sichuan7082/int int name=Wuhan,Hubei6894/int int name=Suzhou,Jiangsu6528/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst /response was: We often come across the scene of the multi-faceted cross, For example, the SQL statement, select count( * ) from table1 group by A,B. Now we slightly modified for FacetComponent, this component to be able to support the multi-faceted cross. Request parameters are as follows: start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=, The effect of the new features are as follows: response lst name=responseHeader int name=status0/int int name=QTime84/int lst name=params str name=facet.crosstrue/str str name=facettrue/str str name=shards 10.253.93.71:62511/solr,10.253.93.71:62512/solr,10.253.93.71:62513/solr,10.253.93.71:62514/solr, /str str name=facet.cross.sep,/str str name=start0/str str name=q*:*/str str name=facet.limit10/str arr name=facet.field struser_city/str struser_province/str /arr str name=rows0/str /lst /lst result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city,user_province int name=Beijing,Beijing16852/int int name=Shanghai,Shanghai16787/int int name=Guangzhou,Gunagdong12950/int int name=Shenzheng,Guangdong11667/int int name=Hangzhou,Zhejiang9997/int int name=Chongqing,Chongqing7624/int int name=Chengdu,Sichuan7082/int int name=Wuhan,Hubei6894/int int name=Suzhou,Jiangsu6528/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst /response Cross facet --- Key: SOLR-3973 URL: https://issues.apache.org/jira/browse/SOLR-3973 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 3.5 Reporter: ZhengBowen Labels: cross, facet, solr Fix For: 3.5 Attachments: crossfacet.patch We often come across the scene of the multi-faceted cross, For example, the SQL statement, select count( * ) from table1 group by A,B. Now we slightly modified for FacetComponent, this component to be able to support the multi-faceted cross. Request parameters are as follows: start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=, The original effect is as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city int name=Beijing16852/int int name=ShangHai16787/int int name=Gunagzhou12950/int int name=Shenzhen11667/int int name=Hangzhou9997/int int name=Chongqing7624/int int name=Chengdu7082/int int name=Wuhan6894/int int name=Suzhou6528/int int name=Tianjin5822/int /lst lst name=user_province int name=Gunagdong48621/int int name=Zhengjiang34634/int int
[jira] [Updated] (SOLR-3973) Cross facet
[ https://issues.apache.org/jira/browse/SOLR-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhengBowen updated SOLR-3973: - Description: We often come across the scene of the multi-faceted cross, For example, the SQL statement, select count( * ) from table group by A,B. Now we slightly modified for FacetComponent, this component to be able to support the multi-faceted cross. Request parameters are as follows: start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=, The original effect is as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city int name=Beijing16852/int int name=ShangHai16787/int int name=Gunagzhou12950/int int name=Shenzhen11667/int int name=Hangzhou9997/int int name=Chongqing7624/int int name=Chengdu7082/int int name=Wuhan6894/int int name=Suzhou6528/int int name=Tianjin5822/int /lst lst name=user_province int name=Gunagdong48621/int int name=Zhengjiang34634/int int name=Jiangsu28748/int int name=Shandong20389/int int name=Fujian18508/int int name=Beijing16852/int int name=Shanghai16787/int int name=Hubei15227/int int name=Sichuan15112/int int name=Hebei13793/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst The effect of the new features are as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city,user_province int name=Beijing,Beijing16852/int int name=Shanghai,Shanghai16787/int int name=Guangzhou,Gunagdong12950/int int name=Shenzheng,Guangdong11667/int int name=Hangzhou,Zhejiang9997/int int name=Chongqing,Chongqing7624/int int name=Chengdu,Sichuan7082/int int name=Wuhan,Hubei6894/int int name=Suzhou,Jiangsu6528/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst /response was: We often come across the scene of the multi-faceted cross, For example, the SQL statement, select count( * ) from table1 group by A,B. Now we slightly modified for FacetComponent, this component to be able to support the multi-faceted cross. Request parameters are as follows: start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=, The original effect is as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city int name=Beijing16852/int int name=ShangHai16787/int int name=Gunagzhou12950/int int name=Shenzhen11667/int int name=Hangzhou9997/int int name=Chongqing7624/int int name=Chengdu7082/int int name=Wuhan6894/int int name=Suzhou6528/int int name=Tianjin5822/int /lst lst name=user_province int name=Gunagdong48621/int int name=Zhengjiang34634/int int name=Jiangsu28748/int int name=Shandong20389/int int name=Fujian18508/int int name=Beijing16852/int int name=Shanghai16787/int int name=Hubei15227/int int name=Sichuan15112/int int name=Hebei13793/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst The effect of the new features are as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city,user_province int name=Beijing,Beijing16852/int int name=Shanghai,Shanghai16787/int int name=Guangzhou,Gunagdong12950/int int name=Shenzheng,Guangdong11667/int int name=Hangzhou,Zhejiang9997/int int name=Chongqing,Chongqing7624/int int name=Chengdu,Sichuan7082/int int name=Wuhan,Hubei6894/int int name=Suzhou,Jiangsu6528/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst /response Cross facet --- Key: SOLR-3973 URL: https://issues.apache.org/jira/browse/SOLR-3973 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 3.5 Reporter: ZhengBowen Labels: cross, facet, solr Fix For: 3.5 Attachments: crossfacet.patch We often come across the scene of the multi-faceted cross, For example, the SQL statement, select count( * ) from table group by A,B. Now we slightly modified for FacetComponent, this component to be able to support the multi-faceted cross. Request parameters are as follows: start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=, The original effect is as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst
[jira] [Updated] (SOLR-3973) Cross facet, facet on multiple columns.
[ https://issues.apache.org/jira/browse/SOLR-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhengBowen updated SOLR-3973: - Summary: Cross facet, facet on multiple columns. (was: Cross facet) Cross facet, facet on multiple columns. --- Key: SOLR-3973 URL: https://issues.apache.org/jira/browse/SOLR-3973 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 3.5 Reporter: ZhengBowen Labels: cross, facet, solr Fix For: 3.5 Attachments: crossfacet.patch We often come across the scene of the multi-faceted cross, For example, the SQL statement, select count( * ) from table group by A,B. Now we slightly modified for FacetComponent, this component to be able to support the multi-faceted cross. Request parameters are as follows: start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=, The original effect is as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city int name=Beijing16852/int int name=ShangHai16787/int int name=Gunagzhou12950/int int name=Shenzhen11667/int int name=Hangzhou9997/int int name=Chongqing7624/int int name=Chengdu7082/int int name=Wuhan6894/int int name=Suzhou6528/int int name=Tianjin5822/int /lst lst name=user_province int name=Gunagdong48621/int int name=Zhengjiang34634/int int name=Jiangsu28748/int int name=Shandong20389/int int name=Fujian18508/int int name=Beijing16852/int int name=Shanghai16787/int int name=Hubei15227/int int name=Sichuan15112/int int name=Hebei13793/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst The effect of the new features are as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city,user_province int name=Beijing,Beijing16852/int int name=Shanghai,Shanghai16787/int int name=Guangzhou,Gunagdong12950/int int name=Shenzheng,Guangdong11667/int int name=Hangzhou,Zhejiang9997/int int name=Chongqing,Chongqing7624/int int name=Chengdu,Sichuan7082/int int name=Wuhan,Hubei6894/int int name=Suzhou,Jiangsu6528/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst /response -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3973) Cross facet, faceting on multiple columns.
[ https://issues.apache.org/jira/browse/SOLR-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhengBowen updated SOLR-3973: - Summary: Cross facet, faceting on multiple columns. (was: Cross facet, facet on multiple columns.) Cross facet, faceting on multiple columns. -- Key: SOLR-3973 URL: https://issues.apache.org/jira/browse/SOLR-3973 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 3.5 Reporter: ZhengBowen Labels: cross, facet, solr Fix For: 3.5 We often come across the scene of the multi-faceted cross, For example, the SQL statement, select count( * ) from table group by A,B. Now we slightly modified for FacetComponent, this component to be able to support the multi-faceted cross. Request parameters are as follows: start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=, The original effect is as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city int name=Beijing16852/int int name=ShangHai16787/int int name=Gunagzhou12950/int int name=Shenzhen11667/int int name=Hangzhou9997/int int name=Chongqing7624/int int name=Chengdu7082/int int name=Wuhan6894/int int name=Suzhou6528/int int name=Tianjin5822/int /lst lst name=user_province int name=Gunagdong48621/int int name=Zhengjiang34634/int int name=Jiangsu28748/int int name=Shandong20389/int int name=Fujian18508/int int name=Beijing16852/int int name=Shanghai16787/int int name=Hubei15227/int int name=Sichuan15112/int int name=Hebei13793/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst The effect of the new features are as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city,user_province int name=Beijing,Beijing16852/int int name=Shanghai,Shanghai16787/int int name=Guangzhou,Gunagdong12950/int int name=Shenzheng,Guangdong11667/int int name=Hangzhou,Zhejiang9997/int int name=Chongqing,Chongqing7624/int int name=Chengdu,Sichuan7082/int int name=Wuhan,Hubei6894/int int name=Suzhou,Jiangsu6528/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst /response -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3973) Cross facet, faceting on multiple columns.
[ https://issues.apache.org/jira/browse/SOLR-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhengBowen updated SOLR-3973: - Attachment: (was: crossfacet.patch) Cross facet, faceting on multiple columns. -- Key: SOLR-3973 URL: https://issues.apache.org/jira/browse/SOLR-3973 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 3.5 Reporter: ZhengBowen Labels: cross, facet, solr Fix For: 3.5 We often come across the scene of the multi-faceted cross, For example, the SQL statement, select count( * ) from table group by A,B. Now we slightly modified for FacetComponent, this component to be able to support the multi-faceted cross. Request parameters are as follows: start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=, The original effect is as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city int name=Beijing16852/int int name=ShangHai16787/int int name=Gunagzhou12950/int int name=Shenzhen11667/int int name=Hangzhou9997/int int name=Chongqing7624/int int name=Chengdu7082/int int name=Wuhan6894/int int name=Suzhou6528/int int name=Tianjin5822/int /lst lst name=user_province int name=Gunagdong48621/int int name=Zhengjiang34634/int int name=Jiangsu28748/int int name=Shandong20389/int int name=Fujian18508/int int name=Beijing16852/int int name=Shanghai16787/int int name=Hubei15227/int int name=Sichuan15112/int int name=Hebei13793/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst The effect of the new features are as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city,user_province int name=Beijing,Beijing16852/int int name=Shanghai,Shanghai16787/int int name=Guangzhou,Gunagdong12950/int int name=Shenzheng,Guangdong11667/int int name=Hangzhou,Zhejiang9997/int int name=Chongqing,Chongqing7624/int int name=Chengdu,Sichuan7082/int int name=Wuhan,Hubei6894/int int name=Suzhou,Jiangsu6528/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst /response -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3973) Cross facet, faceting on multiple columns.
[ https://issues.apache.org/jira/browse/SOLR-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhengBowen updated SOLR-3973: - Attachment: crossfacet.patch the patche of cross facet. Cross facet, faceting on multiple columns. -- Key: SOLR-3973 URL: https://issues.apache.org/jira/browse/SOLR-3973 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 3.5 Reporter: ZhengBowen Labels: cross, facet, solr Fix For: 3.5 Attachments: crossfacet.patch We often come across the scene of the multi-faceted cross, For example, the SQL statement, select count( * ) from table group by A,B. Now we slightly modified for FacetComponent, this component to be able to support the multi-faceted cross. Request parameters are as follows: start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=, The original effect is as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city int name=Beijing16852/int int name=ShangHai16787/int int name=Gunagzhou12950/int int name=Shenzhen11667/int int name=Hangzhou9997/int int name=Chongqing7624/int int name=Chengdu7082/int int name=Wuhan6894/int int name=Suzhou6528/int int name=Tianjin5822/int /lst lst name=user_province int name=Gunagdong48621/int int name=Zhengjiang34634/int int name=Jiangsu28748/int int name=Shandong20389/int int name=Fujian18508/int int name=Beijing16852/int int name=Shanghai16787/int int name=Hubei15227/int int name=Sichuan15112/int int name=Hebei13793/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst The effect of the new features are as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city,user_province int name=Beijing,Beijing16852/int int name=Shanghai,Shanghai16787/int int name=Guangzhou,Gunagdong12950/int int name=Shenzheng,Guangdong11667/int int name=Hangzhou,Zhejiang9997/int int name=Chongqing,Chongqing7624/int int name=Chengdu,Sichuan7082/int int name=Wuhan,Hubei6894/int int name=Suzhou,Jiangsu6528/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst /response -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3973) Cross facet, faceting on multiple columns.
[ https://issues.apache.org/jira/browse/SOLR-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhengBowen updated SOLR-3973: - Description: We often come across the scene of the multi-faceted cross, For example, the SQL statement, select count( * ) from table group by A,B. Now we slightly modified for FacetComponent, this component to be able to support the multi-faceted cross.you can facet on Request parameters are as follows: start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=, The original effect is as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city int name=Beijing16852/int int name=ShangHai16787/int int name=Gunagzhou12950/int int name=Shenzhen11667/int int name=Hangzhou9997/int int name=Chongqing7624/int int name=Chengdu7082/int int name=Wuhan6894/int int name=Suzhou6528/int int name=Tianjin5822/int /lst lst name=user_province int name=Gunagdong48621/int int name=Zhengjiang34634/int int name=Jiangsu28748/int int name=Shandong20389/int int name=Fujian18508/int int name=Beijing16852/int int name=Shanghai16787/int int name=Hubei15227/int int name=Sichuan15112/int int name=Hebei13793/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst The effect of the new features are as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city,user_province int name=Beijing,Beijing16852/int int name=Shanghai,Shanghai16787/int int name=Guangzhou,Gunagdong12950/int int name=Shenzheng,Guangdong11667/int int name=Hangzhou,Zhejiang9997/int int name=Chongqing,Chongqing7624/int int name=Chengdu,Sichuan7082/int int name=Wuhan,Hubei6894/int int name=Suzhou,Jiangsu6528/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst /response was: We often come across the scene of the multi-faceted cross, For example, the SQL statement, select count( * ) from table group by A,B. Now we slightly modified for FacetComponent, this component to be able to support the multi-faceted cross. Request parameters are as follows: start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=, The original effect is as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city int name=Beijing16852/int int name=ShangHai16787/int int name=Gunagzhou12950/int int name=Shenzhen11667/int int name=Hangzhou9997/int int name=Chongqing7624/int int name=Chengdu7082/int int name=Wuhan6894/int int name=Suzhou6528/int int name=Tianjin5822/int /lst lst name=user_province int name=Gunagdong48621/int int name=Zhengjiang34634/int int name=Jiangsu28748/int int name=Shandong20389/int int name=Fujian18508/int int name=Beijing16852/int int name=Shanghai16787/int int name=Hubei15227/int int name=Sichuan15112/int int name=Hebei13793/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst The effect of the new features are as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city,user_province int name=Beijing,Beijing16852/int int name=Shanghai,Shanghai16787/int int name=Guangzhou,Gunagdong12950/int int name=Shenzheng,Guangdong11667/int int name=Hangzhou,Zhejiang9997/int int name=Chongqing,Chongqing7624/int int name=Chengdu,Sichuan7082/int int name=Wuhan,Hubei6894/int int name=Suzhou,Jiangsu6528/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst /response Cross facet, faceting on multiple columns. -- Key: SOLR-3973 URL: https://issues.apache.org/jira/browse/SOLR-3973 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 3.5 Reporter: ZhengBowen Labels: cross, facet, solr Fix For: 3.5 Attachments: crossfacet.patch We often come across the scene of the multi-faceted cross, For example, the SQL statement, select count( * ) from table group by A,B. Now we slightly modified for FacetComponent, this component to be able to support the multi-faceted cross.you can facet on Request parameters are as follows: start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=, The original effect is as follows: result name=response
[jira] [Updated] (SOLR-3973) Cross facet, faceting on multiple columns.
[ https://issues.apache.org/jira/browse/SOLR-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhengBowen updated SOLR-3973: - Description: We often come across the scene of the multi-faceted cross, For example, the SQL statement, select count( * ) from table group by A,B. Now we slightly modified for FacetComponent, this component to be able to support the multi-faceted cross.you can facet on multiple columns, and get the count result of multi-faceted cross. Request parameters are as follows: start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=, The original effect is as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city int name=Beijing16852/int int name=ShangHai16787/int int name=Gunagzhou12950/int int name=Shenzhen11667/int int name=Hangzhou9997/int int name=Chongqing7624/int int name=Chengdu7082/int int name=Wuhan6894/int int name=Suzhou6528/int int name=Tianjin5822/int /lst lst name=user_province int name=Gunagdong48621/int int name=Zhengjiang34634/int int name=Jiangsu28748/int int name=Shandong20389/int int name=Fujian18508/int int name=Beijing16852/int int name=Shanghai16787/int int name=Hubei15227/int int name=Sichuan15112/int int name=Hebei13793/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst The effect of the new features are as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city,user_province int name=Beijing,Beijing16852/int int name=Shanghai,Shanghai16787/int int name=Guangzhou,Gunagdong12950/int int name=Shenzheng,Guangdong11667/int int name=Hangzhou,Zhejiang9997/int int name=Chongqing,Chongqing7624/int int name=Chengdu,Sichuan7082/int int name=Wuhan,Hubei6894/int int name=Suzhou,Jiangsu6528/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst /response was: We often come across the scene of the multi-faceted cross, For example, the SQL statement, select count( * ) from table group by A,B. Now we slightly modified for FacetComponent, this component to be able to support the multi-faceted cross.you can facet on Request parameters are as follows: start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=, The original effect is as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city int name=Beijing16852/int int name=ShangHai16787/int int name=Gunagzhou12950/int int name=Shenzhen11667/int int name=Hangzhou9997/int int name=Chongqing7624/int int name=Chengdu7082/int int name=Wuhan6894/int int name=Suzhou6528/int int name=Tianjin5822/int /lst lst name=user_province int name=Gunagdong48621/int int name=Zhengjiang34634/int int name=Jiangsu28748/int int name=Shandong20389/int int name=Fujian18508/int int name=Beijing16852/int int name=Shanghai16787/int int name=Hubei15227/int int name=Sichuan15112/int int name=Hebei13793/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst The effect of the new features are as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city,user_province int name=Beijing,Beijing16852/int int name=Shanghai,Shanghai16787/int int name=Guangzhou,Gunagdong12950/int int name=Shenzheng,Guangdong11667/int int name=Hangzhou,Zhejiang9997/int int name=Chongqing,Chongqing7624/int int name=Chengdu,Sichuan7082/int int name=Wuhan,Hubei6894/int int name=Suzhou,Jiangsu6528/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst /response Cross facet, faceting on multiple columns. -- Key: SOLR-3973 URL: https://issues.apache.org/jira/browse/SOLR-3973 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 3.5 Reporter: ZhengBowen Labels: cross, facet, solr Fix For: 3.5 Attachments: crossfacet.patch We often come across the scene of the multi-faceted cross, For example, the SQL statement, select count( * ) from table group by A,B. Now we slightly modified for FacetComponent, this component to be able to support the multi-faceted cross.you can facet on multiple columns, and get the count result of multi-faceted cross. Request parameters are as follows:
[jira] [Updated] (SOLR-3973) Cross facet, faceting on multiple columns.
[ https://issues.apache.org/jira/browse/SOLR-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhengBowen updated SOLR-3973: - Comment: was deleted (was: the patche of cross facet.) Cross facet, faceting on multiple columns. -- Key: SOLR-3973 URL: https://issues.apache.org/jira/browse/SOLR-3973 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 3.5 Reporter: ZhengBowen Labels: cross, facet, solr Fix For: 3.5 Attachments: crossfacet.patch We often come across the scene of the multi-faceted cross, For example, the SQL statement, select count( * ) from table group by A,B. Now we slightly modified for FacetComponent, this component to be able to support the multi-faceted cross.you can facet on multiple columns, and get the count result of multi-faceted cross. Request parameters are as follows: start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=, The original effect is as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city int name=Beijing16852/int int name=ShangHai16787/int int name=Gunagzhou12950/int int name=Shenzhen11667/int int name=Hangzhou9997/int int name=Chongqing7624/int int name=Chengdu7082/int int name=Wuhan6894/int int name=Suzhou6528/int int name=Tianjin5822/int /lst lst name=user_province int name=Gunagdong48621/int int name=Zhengjiang34634/int int name=Jiangsu28748/int int name=Shandong20389/int int name=Fujian18508/int int name=Beijing16852/int int name=Shanghai16787/int int name=Hubei15227/int int name=Sichuan15112/int int name=Hebei13793/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst The effect of the new features are as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city,user_province int name=Beijing,Beijing16852/int int name=Shanghai,Shanghai16787/int int name=Guangzhou,Gunagdong12950/int int name=Shenzheng,Guangdong11667/int int name=Hangzhou,Zhejiang9997/int int name=Chongqing,Chongqing7624/int int name=Chengdu,Sichuan7082/int int name=Wuhan,Hubei6894/int int name=Suzhou,Jiangsu6528/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst /response -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3973) Cross facet, faceting on multiple columns.
[ https://issues.apache.org/jira/browse/SOLR-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481198#comment-13481198 ] ZhengBowen commented on SOLR-3973: -- We often come across the scene of the multi-faceted cross, For example, the SQL statement, select count( * ) from table group by A,B. so, this patch is to surport faceting on multiple columns, and you can get the counts result of multi-faceted cross. i come from alipay in china, we use Solr to build multidimensional analysis platform for mass data. Cross facet, faceting on multiple columns. -- Key: SOLR-3973 URL: https://issues.apache.org/jira/browse/SOLR-3973 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 3.5 Reporter: ZhengBowen Labels: cross, facet, solr Fix For: 3.5 Attachments: crossfacet.patch We often come across the scene of the multi-faceted cross, For example, the SQL statement, select count( * ) from table group by A,B. Now we slightly modified for FacetComponent, this component to be able to support the multi-faceted cross.you can facet on multiple columns, and get the count result of multi-faceted cross. Request parameters are as follows: start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=, The original effect is as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city int name=Beijing16852/int int name=ShangHai16787/int int name=Gunagzhou12950/int int name=Shenzhen11667/int int name=Hangzhou9997/int int name=Chongqing7624/int int name=Chengdu7082/int int name=Wuhan6894/int int name=Suzhou6528/int int name=Tianjin5822/int /lst lst name=user_province int name=Gunagdong48621/int int name=Zhengjiang34634/int int name=Jiangsu28748/int int name=Shandong20389/int int name=Fujian18508/int int name=Beijing16852/int int name=Shanghai16787/int int name=Hubei15227/int int name=Sichuan15112/int int name=Hebei13793/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst The effect of the new features are as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city,user_province int name=Beijing,Beijing16852/int int name=Shanghai,Shanghai16787/int int name=Guangzhou,Gunagdong12950/int int name=Shenzheng,Guangdong11667/int int name=Hangzhou,Zhejiang9997/int int name=Chongqing,Chongqing7624/int int name=Chengdu,Sichuan7082/int int name=Wuhan,Hubei6894/int int name=Suzhou,Jiangsu6528/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst /response -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3973) Cross facet, faceting on multiple columns.
[ https://issues.apache.org/jira/browse/SOLR-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhengBowen updated SOLR-3973: - Comment: was deleted (was: the patch of cross facet.) Cross facet, faceting on multiple columns. -- Key: SOLR-3973 URL: https://issues.apache.org/jira/browse/SOLR-3973 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 3.5 Reporter: ZhengBowen Labels: cross, facet, solr Fix For: 3.5 Attachments: crossfacet.patch We often come across the scene of the multi-faceted cross, For example, the SQL statement, select count( * ) from table group by A,B. Now we slightly modified for FacetComponent, this component to be able to support the multi-faceted cross.you can facet on multiple columns, and get the count result of multi-faceted cross. Request parameters are as follows: start=0rows=0q=*:*facet=truefacet.field=user_cityfacet.field=user_provincefacet.limit=10facet.cross=truefacet.cross.sep=, The original effect is as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city int name=Beijing16852/int int name=ShangHai16787/int int name=Gunagzhou12950/int int name=Shenzhen11667/int int name=Hangzhou9997/int int name=Chongqing7624/int int name=Chengdu7082/int int name=Wuhan6894/int int name=Suzhou6528/int int name=Tianjin5822/int /lst lst name=user_province int name=Gunagdong48621/int int name=Zhengjiang34634/int int name=Jiangsu28748/int int name=Shandong20389/int int name=Fujian18508/int int name=Beijing16852/int int name=Shanghai16787/int int name=Hubei15227/int int name=Sichuan15112/int int name=Hebei13793/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst The effect of the new features are as follows: result name=response numFound=479140 start=0 sum=0.0 max=-Infinity min=Infinity/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=user_city,user_province int name=Beijing,Beijing16852/int int name=Shanghai,Shanghai16787/int int name=Guangzhou,Gunagdong12950/int int name=Shenzheng,Guangdong11667/int int name=Hangzhou,Zhejiang9997/int int name=Chongqing,Chongqing7624/int int name=Chengdu,Sichuan7082/int int name=Wuhan,Hubei6894/int int name=Suzhou,Jiangsu6528/int /lst /lst lst name=facet_numTerms/ lst name=facet_dates/ lst name=facet_ranges/ /lst /response -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.8.0-ea-b58) - Build # 1917 - Failure!
It's a JVM crash, you can see the hs dump above in the logs (a copy below). It's a bug on my part that this doesn't complete with a more informational exception message though -- I'll take a look and fix for the next release. Dawid [junit4:junit4] JVM J1: stdout (verbatim) [junit4:junit4] # [junit4:junit4] # A fatal error has been detected by the Java Runtime Environment: [junit4:junit4] # [junit4:junit4] # SIGSEGV (0xb) at pc=0x7fd3893f9058, pid=13675, tid=140546309019392 [junit4:junit4] # [junit4:junit4] # JRE version: Java(TM) SE Runtime Environment (8.0-b58) [junit4:junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b02 mixed mode linux-amd64 compressed oops) [junit4:junit4] # Problematic frame: [junit4:junit4] # V [libjvm.so+0x7da058] ParRootScanWithBarrierTwoGensClosure::do_oop(unsigned int*)+0x78 [junit4:junit4] # [junit4:junit4] # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try ulimit -c unlimited before starting Java again [junit4:junit4] # [junit4:junit4] # An error report file with more information is saved as: [junit4:junit4] # /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build/analysis/common/test/J1/hs_err_pid13675.log [junit4:junit4] # [junit4:junit4] # If you would like to submit a bug report, please visit: [junit4:junit4] # http://bugreport.sun.com/bugreport/crash.jsp [junit4:junit4] # [junit4:junit4] JVM J1: EOF - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4476) maven deployment scripts dont work (except from the machine you made the RC from)
[ https://issues.apache.org/jira/browse/LUCENE-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481226#comment-13481226 ] Uwe Schindler commented on LUCENE-4476: --- bq. Does this also happen on windows if you sign artifacts with your GPG key? Definitely not! The password is hidden! This is clearly a cygwin issue (and only if you use the cygwin console window). With the official Windows 7 cmd.exe in the official Windows console window the password is not shown. I never use cygwin for builfding on windows, why do you Steven? To run ANT and build artifacts a plain cmd.exe is fine. maven deployment scripts dont work (except from the machine you made the RC from) - Key: LUCENE-4476 URL: https://issues.apache.org/jira/browse/LUCENE-4476 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4476.patch, LUCENE-4476.patch, LUCENE-4476.patch Currently the maven process described in http://wiki.apache.org/lucene-java/PublishMavenArtifacts does not work (on mac) It worked fine for the 4.0-alpha and 4.0-beta releases. NOTE: This appears to be working on linux so I am going with that. But this seems strange it doesnt work on mac. {noformat} artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-7:runtime [artifact:pom] Downloading: org/apache/lucene/lucene-parent/4.0.0/lucene-parent-4.0.0.pom from repository sonatype.releases at http://oss.sonatype.org/content/repositories/releases [artifact:pom] Unable to locate resource in repository [artifact:pom] [INFO] Unable to find resource 'org.apache.lucene:lucene-parent:pom:4.0.0' in repository sonatype.releases (http://oss.sonatype.org/content/repositories/releases) [artifact:pom] Downloading: org/apache/lucene/lucene-parent/4.0.0/lucene-parent-4.0.0.pom from repository central at http://repo1.maven.org/maven2 [artifact:pom] Unable to locate resource in repository [artifact:pom] [INFO] Unable to find resource 'org.apache.lucene:lucene-parent:pom:4.0.0' in repository central (http://repo1.maven.org/maven2) [artifact:pom] An error has occurred while processing the Maven artifact tasks. [artifact:pom] Diagnosis: [artifact:pom] [artifact:pom] Unable to initialize POM lucene-test-framework-4.0.0.pom: Cannot find parent: org.apache.lucene:lucene-parent for project: org.apache.lucene:lucene-test-framework:jar:null for project org.apache.lucene:lucene-test-framework:jar:null [artifact:pom] Unable to download the artifact from any repository BUILD FAILED {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Lucene build ivy problems
It only downloads on the first try, later builds never download anything unless dependencies have changed. And if you would be able to *not* download them, your build would not succeed. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de http://www.thetaphi.de/ eMail: u...@thetaphi.de From: Lance Norskog [mailto:goks...@gmail.com] Sent: Monday, October 22, 2012 5:03 AM To: dev@lucene.apache.org Subject: Lucene build ivy problems If I have all of the dependencies downloaded, how can I tell the build to skip checking the repositories? I'm working on a somewhat dodgy internet connection. I ran 'ant example' a hundred times. On the 101st, I had an internet outage and the Ivy stuff blocked. Ever after that the resolver hangs. I had to remove the home/.ivy2 directory and start over. And now all of the dependencies are slowly downloading again over a dodgy internet cafe connection. Is there some flag to the ant build that says just pretend everything is downloaded?
[jira] [Commented] (LUCENE-4476) maven deployment scripts dont work (except from the machine you made the RC from)
[ https://issues.apache.org/jira/browse/LUCENE-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481228#comment-13481228 ] Uwe Schindler commented on LUCENE-4476: --- Ah, also: if you run bash.exe in the official Windows console windows (not cygwin's own), it also works. It's a bug of the dumb cygwin-internal console window only (why do they have it?) - sorry, I have to rant about Cygwin; I use it, too, but only to execute find/sed/grep... maven deployment scripts dont work (except from the machine you made the RC from) - Key: LUCENE-4476 URL: https://issues.apache.org/jira/browse/LUCENE-4476 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4476.patch, LUCENE-4476.patch, LUCENE-4476.patch Currently the maven process described in http://wiki.apache.org/lucene-java/PublishMavenArtifacts does not work (on mac) It worked fine for the 4.0-alpha and 4.0-beta releases. NOTE: This appears to be working on linux so I am going with that. But this seems strange it doesnt work on mac. {noformat} artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-7:runtime [artifact:pom] Downloading: org/apache/lucene/lucene-parent/4.0.0/lucene-parent-4.0.0.pom from repository sonatype.releases at http://oss.sonatype.org/content/repositories/releases [artifact:pom] Unable to locate resource in repository [artifact:pom] [INFO] Unable to find resource 'org.apache.lucene:lucene-parent:pom:4.0.0' in repository sonatype.releases (http://oss.sonatype.org/content/repositories/releases) [artifact:pom] Downloading: org/apache/lucene/lucene-parent/4.0.0/lucene-parent-4.0.0.pom from repository central at http://repo1.maven.org/maven2 [artifact:pom] Unable to locate resource in repository [artifact:pom] [INFO] Unable to find resource 'org.apache.lucene:lucene-parent:pom:4.0.0' in repository central (http://repo1.maven.org/maven2) [artifact:pom] An error has occurred while processing the Maven artifact tasks. [artifact:pom] Diagnosis: [artifact:pom] [artifact:pom] Unable to initialize POM lucene-test-framework-4.0.0.pom: Cannot find parent: org.apache.lucene:lucene-parent for project: org.apache.lucene:lucene-test-framework:jar:null for project org.apache.lucene:lucene-test-framework:jar:null [artifact:pom] Unable to download the artifact from any repository BUILD FAILED {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.8.0-ea-b58) - Build # 1917 - Failure!
Ok, everything is fine -- the default action on seeing forked process output is to pipe it to the logs, warn but not throw an exception (which the build followed). The problem was that people used various -D options for debugging and profiling which resulted in legitimate output to process output descriptors (bypassing System.* stream redirectors). I've changed the message a bit to indicate whether any output was emitted on exit status != 0. Dawid On Mon, Oct 22, 2012 at 8:47 AM, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: It's a JVM crash, you can see the hs dump above in the logs (a copy below). It's a bug on my part that this doesn't complete with a more informational exception message though -- I'll take a look and fix for the next release. Dawid [junit4:junit4] JVM J1: stdout (verbatim) [junit4:junit4] # [junit4:junit4] # A fatal error has been detected by the Java Runtime Environment: [junit4:junit4] # [junit4:junit4] # SIGSEGV (0xb) at pc=0x7fd3893f9058, pid=13675, tid=140546309019392 [junit4:junit4] # [junit4:junit4] # JRE version: Java(TM) SE Runtime Environment (8.0-b58) [junit4:junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b02 mixed mode linux-amd64 compressed oops) [junit4:junit4] # Problematic frame: [junit4:junit4] # V [libjvm.so+0x7da058] ParRootScanWithBarrierTwoGensClosure::do_oop(unsigned int*)+0x78 [junit4:junit4] # [junit4:junit4] # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try ulimit -c unlimited before starting Java again [junit4:junit4] # [junit4:junit4] # An error report file with more information is saved as: [junit4:junit4] # /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build/analysis/common/test/J1/hs_err_pid13675.log [junit4:junit4] # [junit4:junit4] # If you would like to submit a bug report, please visit: [junit4:junit4] # http://bugreport.sun.com/bugreport/crash.jsp [junit4:junit4] # [junit4:junit4] JVM J1: EOF - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3964) Solr does not return error, even though create collection unsuccessfully
[ https://issues.apache.org/jira/browse/SOLR-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] milesli updated SOLR-3964: -- Description: Solr does not return error, even though create/delete collection unsuccessfully; even though the request URL is incorrect; (example: http://127.0.0.1:8983/solr/admin/collections?action=CREATEname=tenancy_milesnumShards=3numReplicas=2collection.configName=myconf) even though pass the collection name already exists; was: Solr does not return error, even though create collection unsuccessfully; even though the request URL is incorrect; (example: http://127.0.0.1:8983/solr/admin/collections?action=CREATEname=tenancy_milesnumShards=3numReplicas=2collection.configName=myconf) even though pass the collection name already exists; Solr does not return error, even though create collection unsuccessfully - Key: SOLR-3964 URL: https://issues.apache.org/jira/browse/SOLR-3964 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Reporter: milesli Labels: lack, message, response Original Estimate: 6h Remaining Estimate: 6h Solr does not return error, even though create/delete collection unsuccessfully; even though the request URL is incorrect; (example: http://127.0.0.1:8983/solr/admin/collections?action=CREATEname=tenancy_milesnumShards=3numReplicas=2collection.configName=myconf) even though pass the collection name already exists; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3964) Solr does not return error, even though create collection unsuccessfully
[ https://issues.apache.org/jira/browse/SOLR-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] milesli updated SOLR-3964: -- Priority: Major (was: Minor) Solr does not return error, even though create collection unsuccessfully - Key: SOLR-3964 URL: https://issues.apache.org/jira/browse/SOLR-3964 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Reporter: milesli Labels: lack, message, response Original Estimate: 6h Remaining Estimate: 6h Solr does not return error, even though create collection unsuccessfully; even though the request URL is incorrect; (example: http://127.0.0.1:8983/solr/admin/collections?action=CREATEname=tenancy_milesnumShards=3numReplicas=2collection.configName=myconf) even though pass the collection name already exists; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4494) Add phoenetic algorithm Match Rating approach to lucene
[ https://issues.apache.org/jira/browse/LUCENE-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481271#comment-13481271 ] Colm Rice commented on LUCENE-4494: --- Thanks Steve. Glad to be able to contribute. The first of many :-) Thanks for the link, I'll swot up on it. Hi Lance, yes that's the one. I wrote that article btw! Add phoenetic algorithm Match Rating approach to lucene --- Key: LUCENE-4494 URL: https://issues.apache.org/jira/browse/LUCENE-4494 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0-ALPHA Reporter: Colm Rice Priority: Minor Fix For: 4.1 Original Estimate: 168h Remaining Estimate: 168h I want to add MatchRatingApproach algorithm to the Lucene project. What I have at the moment is a class called org.apache.lucene.analysis.phoenetic.MatchRatingApproach implementing StringEncoder I have a pretty comprehensive test file located at: org.apache.lucene.analysis.phonetic.MatchRatingApproachTests It's not exactly existing pattern so I'm going to need a bit of advice here. Thanks! Feel free to email. FYI: It my first contribitution so be gentle :-) C# is my native. Reference: http://en.wikipedia.org/wiki/Match_rating_approach -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3245) Poor performance of Hunspell with Polish Dictionary
[ https://issues.apache.org/jira/browse/SOLR-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481277#comment-13481277 ] Romain MERESSE commented on SOLR-3245: -- Same problem here, with French dictionary in Solr 3.6 With Hunspell : ~5 documents/s Without Hunspell : ~280 documents/s Someone got a solution ? ... Poor performance of Hunspell with Polish Dictionary --- Key: SOLR-3245 URL: https://issues.apache.org/jira/browse/SOLR-3245 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 4.0-ALPHA Environment: Centos 6.2, kernel 2.6.32, 2 physical CPU Xeon 5606 (4 cores each), 32 GB RAM, 2 SSD disks in RAID 0, java version 1.6.0_26, java settings -server -Xms4096M -Xmx4096M Reporter: Agnieszka Labels: performance Attachments: pl_PL.zip In Solr 4.0 Hunspell stemmer with polish dictionary has poor performance whereas performance of hunspell from http://code.google.com/p/lucene-hunspell/ in solr 3.4 is very good. Tests shows: Solr 3.4, full import 489017 documents: StempelPolishStemFilterFactory - 2908 seconds, 168 docs/sec HunspellStemFilterFactory - 3922 seconds, 125 docs/sec Solr 4.0, full import 489017 documents: StempelPolishStemFilterFactory - 3016 seconds, 162 docs/sec HunspellStemFilterFactory - 44580 seconds (more than 12 hours), 11 docs/sec My schema is quit easy. For Hunspell I have one text field I copy 14 text fields to: {code:xml} field name=text type=text_pl_hunspell indexed=true stored=false multiValued=true/ copyField source=field1 dest=text/ copyField source=field14 dest=text/ {code} The text_pl_hunspell configuration: {code:xml} fieldType name=text_pl_hunspell class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.HunspellStemFilterFactory dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true !--filter class=solr.KeywordMarkerFilterFactory protected=protwords_pl.txt/-- /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.HunspellStemFilterFactory dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true filter class=solr.KeywordMarkerFilterFactory protected=dict/protwords_pl.txt/ /analyzer /fieldType {code} I use Polish dictionary (files stopwords_pl.txt, protwords_pl.txt, synonyms_pl.txt are empy)- pl_PL.dic, pl_PL.aff. These are the same files I used in 3.4 version. For Polish Stemmer the diffrence is only in definion text field: {code} field name=text type=text_pl indexed=true stored=false multiValued=true/ fieldType name=text_pl class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=dict/protwords_pl.txt/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=dict/protwords_pl.txt/ /analyzer /fieldType {code} One document has 23 fields: - 14 text fields copy to one text field (above) that is only indexed - 8 other indexed fields (2 strings, 2 tdates, 3 tint, 1 tfloat) The size of one document is 3-4 kB. -- This message is automatically generated by JIRA. If you think it
[jira] [Comment Edited] (SOLR-3245) Poor performance of Hunspell with Polish Dictionary
[ https://issues.apache.org/jira/browse/SOLR-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481277#comment-13481277 ] Romain MERESSE edited comment on SOLR-3245 at 10/22/12 9:51 AM: Same problem here, with French dictionary in Solr 3.6 With Hunspell : ~5 documents/s Without Hunspell : ~280 documents/s Someone got a solution ? ... Quite sad as this is a very important feature (stemming is poor with Snowball) was (Author: rohk): Same problem here, with French dictionary in Solr 3.6 With Hunspell : ~5 documents/s Without Hunspell : ~280 documents/s Someone got a solution ? ... Poor performance of Hunspell with Polish Dictionary --- Key: SOLR-3245 URL: https://issues.apache.org/jira/browse/SOLR-3245 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 4.0-ALPHA Environment: Centos 6.2, kernel 2.6.32, 2 physical CPU Xeon 5606 (4 cores each), 32 GB RAM, 2 SSD disks in RAID 0, java version 1.6.0_26, java settings -server -Xms4096M -Xmx4096M Reporter: Agnieszka Labels: performance Attachments: pl_PL.zip In Solr 4.0 Hunspell stemmer with polish dictionary has poor performance whereas performance of hunspell from http://code.google.com/p/lucene-hunspell/ in solr 3.4 is very good. Tests shows: Solr 3.4, full import 489017 documents: StempelPolishStemFilterFactory - 2908 seconds, 168 docs/sec HunspellStemFilterFactory - 3922 seconds, 125 docs/sec Solr 4.0, full import 489017 documents: StempelPolishStemFilterFactory - 3016 seconds, 162 docs/sec HunspellStemFilterFactory - 44580 seconds (more than 12 hours), 11 docs/sec My schema is quit easy. For Hunspell I have one text field I copy 14 text fields to: {code:xml} field name=text type=text_pl_hunspell indexed=true stored=false multiValued=true/ copyField source=field1 dest=text/ copyField source=field14 dest=text/ {code} The text_pl_hunspell configuration: {code:xml} fieldType name=text_pl_hunspell class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.HunspellStemFilterFactory dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true !--filter class=solr.KeywordMarkerFilterFactory protected=protwords_pl.txt/-- /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.HunspellStemFilterFactory dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true filter class=solr.KeywordMarkerFilterFactory protected=dict/protwords_pl.txt/ /analyzer /fieldType {code} I use Polish dictionary (files stopwords_pl.txt, protwords_pl.txt, synonyms_pl.txt are empy)- pl_PL.dic, pl_PL.aff. These are the same files I used in 3.4 version. For Polish Stemmer the diffrence is only in definion text field: {code} field name=text type=text_pl indexed=true stored=false multiValued=true/ fieldType name=text_pl class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=dict/protwords_pl.txt/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory/ filter class=solr.KeywordMarkerFilterFactory
[jira] [Created] (SOLR-3974) Disabling External entity resolution when using XSL in DIH
Stephane Gamard created SOLR-3974: - Summary: Disabling External entity resolution when using XSL in DIH Key: SOLR-3974 URL: https://issues.apache.org/jira/browse/SOLR-3974 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 4.0, 4.1 Reporter: Stephane Gamard When using XSL transformation in DIH Solr tries to resolve DTD and fails when missing. This is similar to SOLR-3895 (which is solely intended to the RequestHandler). Sample data-config.xml: {code:xml} entity name=sample processor=FileListEntityProcessor baseDir=/Volumes/data/datasets/sample fileName=^.*\.xml$ recursive=true rootEntity=false dataSource=null entity name=article stream=false xsl=xslt/toDocument.xslt processor=XPathEntityProcessor url=${sample.fileAbsolutePath} useSolrAddSchema=true /entity /entity {code} Import will fail with the following error: {code} Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Exception in applying XSL Transformeation Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:304) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:224) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:204) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:472) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:498) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:411) ... 5 more Caused by: javax.xml.transform.TransformerException: javax.xml.transform.TransformerException: com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: /opt/solr/archivearticle3.dtd (No such file or directory) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:735) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:336) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:299) ... 11 more Caused by: javax.xml.transform.TransformerException: com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: /opt/solr/archivearticle3.dtd (No such file or directory) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getDOM(TransformerImpl.java:564) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:725) ... 13 more Caused by: com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: /opt/solr/archivearticle3.dtd (No such file or directory) at com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(XSLTCDTMManager.java:460) at com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(XSLTCDTMManager.java:248) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getDOM(TransformerImpl.java:542) ... 14 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3974) Disabling External entity resolution when using XSL in DIH
[ https://issues.apache.org/jira/browse/SOLR-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephane Gamard updated SOLR-3974: -- Component/s: update Disabling External entity resolution when using XSL in DIH -- Key: SOLR-3974 URL: https://issues.apache.org/jira/browse/SOLR-3974 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler, update Affects Versions: 4.0, 4.1 Reporter: Stephane Gamard When using XSL transformation in DIH Solr tries to resolve DTD and fails when missing. This is similar to SOLR-3895 (which is solely intended to the RequestHandler). Sample data-config.xml: {code:xml} entity name=sample processor=FileListEntityProcessor baseDir=/Volumes/data/datasets/sample fileName=^.*\.xml$ recursive=true rootEntity=false dataSource=null entity name=article stream=false xsl=xslt/toDocument.xslt processor=XPathEntityProcessor url=${sample.fileAbsolutePath} useSolrAddSchema=true /entity /entity {code} Import will fail with the following error: {code} Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Exception in applying XSL Transformeation Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:304) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:224) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:204) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:472) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:498) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:411) ... 5 more Caused by: javax.xml.transform.TransformerException: javax.xml.transform.TransformerException: com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: /opt/solr/archivearticle3.dtd (No such file or directory) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:735) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:336) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:299) ... 11 more Caused by: javax.xml.transform.TransformerException: com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: /opt/solr/archivearticle3.dtd (No such file or directory) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getDOM(TransformerImpl.java:564) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:725) ... 13 more Caused by: com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: /opt/solr/archivearticle3.dtd (No such file or directory) at com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(XSLTCDTMManager.java:460) at com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(XSLTCDTMManager.java:248) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getDOM(TransformerImpl.java:542) ... 14 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3975) Document Summarization toolkit, using LSA techniques
Lance Norskog created SOLR-3975: --- Summary: Document Summarization toolkit, using LSA techniques Key: SOLR-3975 URL: https://issues.apache.org/jira/browse/SOLR-3975 Project: Solr Issue Type: New Feature Reporter: Lance Norskog Priority: Minor Attachments: 4.1.summary.patch, reuters.sh This package analyzes sentences and words as used across sentences to rank the most important sentences and words. The general topic is called document summarization and is a popular research topic in textual analysis. How to use: 1) Check out the 4.x branch, apply the patch, build, and run the solr/example instance. 2) Download the first Reuters article corpus from: http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.tar.gz 3) Unpack this into a directory. 4) Run the attached 'reuters.sh' script: sh reuters.sh directory http://localhost:8983/solr/collection1 5) Wait several minutes. Now go to http://localhost:8983/solr/collection1/browse?summary=true and look at the large gray box marked 'Document Summary'. This has a table of statistics about the analysis, the three most important sentences, and several of the most important words in the documents. The sentences have the important tags in italics. The code is packaged as a search component and as an analysis handler. The /browse demo uses the search component, and you can also post raw text to http://localhost:8983/solr/collection1/analysis/summary. Here is a sample command: curl -s http://localhost:8983/solr/analysis/summary?indent=trueechoParams=allfile=$FILEwt=xml; --data-binary @$FILE -H 'Content-type:application/xml' This is an implementation of LSA-based document summarization. A short explanation and a long evaluation are described in my blog, [Uncle Lance's Ultra Whiz Bang|http://ultrawhizbang.blogspot.com], starting here: [http://ultrawhizbang.blogspot.com/2012/09/document-summarization-with-lsa-1.html] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3975) Document Summarization toolkit, using LSA techniques
[ https://issues.apache.org/jira/browse/SOLR-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Norskog updated SOLR-3975: Attachment: reuters.sh 4.1.summary.patch Document Summarization toolkit, using LSA techniques Key: SOLR-3975 URL: https://issues.apache.org/jira/browse/SOLR-3975 Project: Solr Issue Type: New Feature Reporter: Lance Norskog Priority: Minor Attachments: 4.1.summary.patch, reuters.sh This package analyzes sentences and words as used across sentences to rank the most important sentences and words. The general topic is called document summarization and is a popular research topic in textual analysis. How to use: 1) Check out the 4.x branch, apply the patch, build, and run the solr/example instance. 2) Download the first Reuters article corpus from: http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.tar.gz 3) Unpack this into a directory. 4) Run the attached 'reuters.sh' script: sh reuters.sh directory http://localhost:8983/solr/collection1 5) Wait several minutes. Now go to http://localhost:8983/solr/collection1/browse?summary=true and look at the large gray box marked 'Document Summary'. This has a table of statistics about the analysis, the three most important sentences, and several of the most important words in the documents. The sentences have the important tags in italics. The code is packaged as a search component and as an analysis handler. The /browse demo uses the search component, and you can also post raw text to http://localhost:8983/solr/collection1/analysis/summary. Here is a sample command: curl -s http://localhost:8983/solr/analysis/summary?indent=trueechoParams=allfile=$FILEwt=xml; --data-binary @$FILE -H 'Content-type:application/xml' This is an implementation of LSA-based document summarization. A short explanation and a long evaluation are described in my blog, [Uncle Lance's Ultra Whiz Bang|http://ultrawhizbang.blogspot.com], starting here: [http://ultrawhizbang.blogspot.com/2012/09/document-summarization-with-lsa-1.html] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3975) Document Summarization toolkit, using LSA techniques
[ https://issues.apache.org/jira/browse/SOLR-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Norskog updated SOLR-3975: Description: This package analyzes sentences and words as used across sentences to rank the most important sentences and words. The general topic is called document summarization and is a popular research topic in textual analysis. How to use: 1) Check out the 4.x branch, apply the patch, build, and run the solr/example instance. 2) Download the first Reuters article corpus from: http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.tar.gz 3) Unpack this into a directory. 4) Run the attached 'reuters.sh' script: sh reuters.sh directory http://localhost:8983/solr/collection1 5) Wait several minutes. Now go to http://localhost:8983/solr/collection1/browse?summary=true and look at the large gray box marked 'Document Summary'. This has a table of statistics about the analysis, the three most important sentences, and several of the most important words in the documents. The sentences have the important words in italics. The code is packaged as a search component and as an analysis handler. The /browse demo uses the search component, and you can also post raw text to http://localhost:8983/solr/collection1/analysis/summary. Here is a sample command: {code} curl -s http://localhost:8983/solr/analysis/summary?indent=trueechoParams=allfile=$FILEwt=xml; --data-binary @$FILE -H 'Content-type:application/xml' {code} This is an implementation of LSA-based document summarization. A short explanation and a long evaluation are described in my blog, [Uncle Lance's Ultra Whiz Bang|http://ultrawhizbang.blogspot.com], starting here: [http://ultrawhizbang.blogspot.com/2012/09/document-summarization-with-lsa-1.html] was: This package analyzes sentences and words as used across sentences to rank the most important sentences and words. The general topic is called document summarization and is a popular research topic in textual analysis. How to use: 1) Check out the 4.x branch, apply the patch, build, and run the solr/example instance. 2) Download the first Reuters article corpus from: http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.tar.gz 3) Unpack this into a directory. 4) Run the attached 'reuters.sh' script: sh reuters.sh directory http://localhost:8983/solr/collection1 5) Wait several minutes. Now go to http://localhost:8983/solr/collection1/browse?summary=true and look at the large gray box marked 'Document Summary'. This has a table of statistics about the analysis, the three most important sentences, and several of the most important words in the documents. The sentences have the important tags in italics. The code is packaged as a search component and as an analysis handler. The /browse demo uses the search component, and you can also post raw text to http://localhost:8983/solr/collection1/analysis/summary. Here is a sample command: curl -s http://localhost:8983/solr/analysis/summary?indent=trueechoParams=allfile=$FILEwt=xml; --data-binary @$FILE -H 'Content-type:application/xml' This is an implementation of LSA-based document summarization. A short explanation and a long evaluation are described in my blog, [Uncle Lance's Ultra Whiz Bang|http://ultrawhizbang.blogspot.com], starting here: [http://ultrawhizbang.blogspot.com/2012/09/document-summarization-with-lsa-1.html] Document Summarization toolkit, using LSA techniques Key: SOLR-3975 URL: https://issues.apache.org/jira/browse/SOLR-3975 Project: Solr Issue Type: New Feature Reporter: Lance Norskog Priority: Minor Attachments: 4.1.summary.patch, reuters.sh This package analyzes sentences and words as used across sentences to rank the most important sentences and words. The general topic is called document summarization and is a popular research topic in textual analysis. How to use: 1) Check out the 4.x branch, apply the patch, build, and run the solr/example instance. 2) Download the first Reuters article corpus from: http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.tar.gz 3) Unpack this into a directory. 4) Run the attached 'reuters.sh' script: sh reuters.sh directory http://localhost:8983/solr/collection1 5) Wait several minutes. Now go to http://localhost:8983/solr/collection1/browse?summary=true and look at the large gray box marked 'Document Summary'. This has a table of statistics about the analysis, the three most important sentences, and several of the most important words in the documents. The sentences have the important words in italics. The code is packaged as a search component and as an analysis handler. The /browse demo uses the search component, and you can also post raw text to
[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores
[ https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481296#comment-13481296 ] Erick Erickson commented on SOLR-1293: -- Well, I think this JIRA will finally get some action... Jose: The actual availability of any particular feature is best tracked by the actual JIRA ticket. The fix version is usually the earliest _possible_ fix. Not until the resolution is something like fixed is the code really in the code line. All: OK, I'm thinking along these lines. I've started implementation, but wanted to open up the discussion in case I'm going down the wrong path. Assumption: 1 For installations with multiple thousands of cores, provision has to me made for some kind of administrative process, probably an RDBMS that really maintains this information. So here's a brief outline of the approach I'm thinking about. 1 Add an additional optional parameter to the cores entry in solr.xml, LRUCacheSize=#. (what default?) 2 Implement SOLR-1306, allow a data provider to be specified in solr.xml that gives back core descriptions, something like: coreDescriptorProvider class=com.foo.FooDataProvider [attr=val]/ (don't quite know what attrs we want, if any). 3 Add two optional attributes to individual core entries a sticky=true|false. Default to true. Any cores marked with this would never be aged out, essentially treat them just as current. b loadOnStartup=true|false, default to true. 4 so the process of getting a core would be something like a check the normal list, just like now. If a core was found, return it. b Check the LRU list, if a core was found, return it. c ask the dataprovider (if defined) for the core descriptor. create the core and put it in the LRU list. d remove any core entries over the LRU limit. Any hints on the right cache to use? There's the Lucene LRUCache, ConcurrentLRUCache, the LRUHashMap in lucene that I can't find in any of the compiled jars). I've got to close the core as it's removed It _looks_ like I can use ConcurrentLRUCache and add a listener to close the core when it's removed from the list. Processing-wise, in the usual case this would cost an extra check each time a core was fetched. If a above failed, we would have to see if the dataprovider was defined before returning null. I don't think that's onerous, the rest of the costs would only be incurred when a dataprovider _did_ exist. But one design decisions here is along these lines. What to do with persistence and stickiness? Specifically, if the coreDescriptorProvider gives us a core from, say, an RDBMS, should we allow that core to be persisted into the solr.xml file if they've set persist=true in solr.xml? I'm thinking that we can make this all work with maximum flexibility if we allow the coreDataProvider to tell us whether we should persist any core currently loaded Anyway, I'll be fleshing this out over the next little while, anybody want to weigh in? Erick Support for large no:of cores and faster loading/unloading of cores --- Key: SOLR-1293 URL: https://issues.apache.org/jira/browse/SOLR-1293 Project: Solr Issue Type: New Feature Components: multicore Reporter: Noble Paul Fix For: 4.1 Attachments: SOLR-1293.patch Solr , currently ,is not very suitable for a large no:of homogeneous cores where you require fast/frequent loading/unloading of cores . usually a core is required to be loaded just to fire a search query or to just index one document The requirements of such a system are. * Very efficient loading of cores . Solr cannot afford to read and parse and create Schema, SolrConfig Objects for each core each time the core has to be loaded ( SOLR-919 , SOLR-920) * START STOP core . Currently it is only possible to unload a core (SOLR-880) * Automatic loading of cores . If a core is present and it is not loaded and a request comes for that load it automatically before serving up a request * As there are a large no:of cores , all the cores cannot be kept loaded always. There has to be an upper limit beyond which we need to unload a few cores (probably the least recently used ones) * Automatic allotment of dataDir for cores. If the no:of cores is too high al the cores' dataDirs cannot live in the same dir. There is an upper limit on the no:of dirs you can create in a unix dir w/o affecting performance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
[jira] [Commented] (LUCENE-4496) Don't decode unnecessary freq blocks in 4.1 codec
[ https://issues.apache.org/jira/browse/LUCENE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481299#comment-13481299 ] Michael McCandless commented on LUCENE-4496: +1 Don't decode unnecessary freq blocks in 4.1 codec - Key: LUCENE-4496 URL: https://issues.apache.org/jira/browse/LUCENE-4496 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Affects Versions: 4.1 Reporter: Robert Muir Attachments: LUCENE-4496.patch, LUCENE-4496.patch TermsEnum.docs() has an expert flag to specify you don't require frequencies. This is currently set by some things that don't need it: we should call ForUtil.skipBlock instead of ForUtil.readBlock in this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
entry with enumeration
Hello, I would like to know if it is possible to automatically create entries in the index to arrange different terms ( concatenation of the terms) in a same entry in the index? The condition to create this enumation for the terms would be a document property. For example: Nodes: [c1,c5] Employee: c1,c2,c3,c4 Person: c2,c3,c5 Sector: c3,c4,c5 I would like to create this automatically: Employee###Person: c2,c3 Employee###Sector: c3,c4 Person###Sector: c3,c5 -- View this message in context: http://lucene.472066.n3.nabble.com/entry-with-enumeration-tp4015097.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.6.0_35) - Build # 1925 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/1925/ Java: 32bit/jdk1.6.0_35 -server -XX:+UseParallelGC All tests passed Build Log: [...truncated 23483 lines...] BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:60: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:517: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1937: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293) at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:331) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:863) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1203) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1230) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1214) at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:434) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:166) at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:133) at org.apache.tools.ant.taskdefs.Get$GetThread.openConnection(Get.java:660) at org.apache.tools.ant.taskdefs.Get$GetThread.get(Get.java:579) at org.apache.tools.ant.taskdefs.Get$GetThread.run(Get.java:569) Total time: 28 minutes 56 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 32bit/jdk1.6.0_35 -server -XX:+UseParallelGC Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4476) maven deployment scripts dont work (except from the machine you made the RC from)
[ https://issues.apache.org/jira/browse/LUCENE-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481327#comment-13481327 ] Steven Rowe commented on LUCENE-4476: - {quote} bq. Does this also happen on windows if you sign artifacts with your GPG key? Definitely not! The password is hidden! This is clearly a cygwin issue (and only if you use the cygwin console window). With the official Windows 7 cmd.exe in the official Windows console window the password is not shown. I never use cygwin for builfding on windows, why do you Steven? To run ANT and build artifacts a plain cmd.exe is fine. {quote} I agree, Uwe - password hiding with Ant's secure input handler works on Win7 cmd window for me too. Definitely a cygwin-specific issue. I use bash under an Xterm, because I feel like it :) - it's the maximally Unix-ish experience on Windows. Also, when mixing native binaries and Cygwin binaries, it's easier to use Cygwin tools to keep everybody happy from bash.exe, rather than from cmd.exe. Also, the Xterm window is resizeable (win console has a fixed width) and is more customizable. {quote} Ah, also: if you run bash.exe in the official Windows console windows (not cygwin's own), it also works. It's a bug of the dumb cygwin-internal console window only (why do they have it?) - sorry, I have to rant about Cygwin; I use it, too, but only to execute find/sed/grep... {quote} (And perl, and python, and .) Interesting, I hadn't considered running bash under the windows console. Of course C:\cygwin\bin\ would have to be on the path. I agree the cygwin-internal console window is sucky - I never use it. maven deployment scripts dont work (except from the machine you made the RC from) - Key: LUCENE-4476 URL: https://issues.apache.org/jira/browse/LUCENE-4476 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4476.patch, LUCENE-4476.patch, LUCENE-4476.patch Currently the maven process described in http://wiki.apache.org/lucene-java/PublishMavenArtifacts does not work (on mac) It worked fine for the 4.0-alpha and 4.0-beta releases. NOTE: This appears to be working on linux so I am going with that. But this seems strange it doesnt work on mac. {noformat} artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-7:runtime [artifact:pom] Downloading: org/apache/lucene/lucene-parent/4.0.0/lucene-parent-4.0.0.pom from repository sonatype.releases at http://oss.sonatype.org/content/repositories/releases [artifact:pom] Unable to locate resource in repository [artifact:pom] [INFO] Unable to find resource 'org.apache.lucene:lucene-parent:pom:4.0.0' in repository sonatype.releases (http://oss.sonatype.org/content/repositories/releases) [artifact:pom] Downloading: org/apache/lucene/lucene-parent/4.0.0/lucene-parent-4.0.0.pom from repository central at http://repo1.maven.org/maven2 [artifact:pom] Unable to locate resource in repository [artifact:pom] [INFO] Unable to find resource 'org.apache.lucene:lucene-parent:pom:4.0.0' in repository central (http://repo1.maven.org/maven2) [artifact:pom] An error has occurred while processing the Maven artifact tasks. [artifact:pom] Diagnosis: [artifact:pom] [artifact:pom] Unable to initialize POM lucene-test-framework-4.0.0.pom: Cannot find parent: org.apache.lucene:lucene-parent for project: org.apache.lucene:lucene-test-framework:jar:null for project org.apache.lucene:lucene-test-framework:jar:null [artifact:pom] Unable to download the artifact from any repository BUILD FAILED {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3976) HTMLStripTransformer strips the tika field not the field to index - cannot have both (stripped and unstripped)
Markus Klose created SOLR-3976: -- Summary: HTMLStripTransformer strips the tika field not the field to index - cannot have both (stripped and unstripped) Key: SOLR-3976 URL: https://issues.apache.org/jira/browse/SOLR-3976 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 3.6 Reporter: Markus Klose Priority: Minor I run into the situation to index an html file using the dataimport handler and got an unexpected output. I wanted to create one field with the original content and one field with the same content but without html markup. If I enaple the HTMLStripTransformer at field text2 the other one (text1) is striped as well example configuraion: dataConfig dataSource type=BinFileDataSource name=bin/ document entity name=f processor=FileListEntityProcessor recursive=true rootEntity=false dataSource=null baseDir= fileName=.*.html onError=skip entity name=tika-test processor=TikaEntityProcessor url=${f.fileAbsolutePath} format=html dataSource=bin onError=skip transformer=HTMLStripTransformer,TemplateTransformer field column=id template=${f.file}/ field column=text name=text1 / field column=text name=text2 stripHTML=true/ /entity /entity /document /dataConfig -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3976) HTMLStripTransformer strips the tika field not the field to index - cannot have both (stripped and unstripped)
[ https://issues.apache.org/jira/browse/SOLR-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-3976. -- Resolution: Not A Problem Please raise this kind of issue on the user's list rather than a JIRA first in case it has a simple resolution. In this case, I'd use a copyField from text1 to text2 in your schema.xml. HTMLStripTransformer strips the tika field not the field to index - cannot have both (stripped and unstripped) - Key: SOLR-3976 URL: https://issues.apache.org/jira/browse/SOLR-3976 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 3.6 Reporter: Markus Klose Priority: Minor I run into the situation to index an html file using the dataimport handler and got an unexpected output. I wanted to create one field with the original content and one field with the same content but without html markup. If I enaple the HTMLStripTransformer at field text2 the other one (text1) is striped as well example configuraion: dataConfig dataSource type=BinFileDataSource name=bin/ document entity name=f processor=FileListEntityProcessor recursive=true rootEntity=false dataSource=null baseDir= fileName=.*.html onError=skip entity name=tika-test processor=TikaEntityProcessor url=${f.fileAbsolutePath} format=html dataSource=bin onError=skip transformer=HTMLStripTransformer,TemplateTransformer field column=id template=${f.file}/ field column=text name=text1 / field column=text name=text2 stripHTML=true/ /entity /entity /document /dataConfig -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3976) HTMLStripTransformer strips the tika field not the field to index - cannot have both (stripped and unstripped)
[ https://issues.apache.org/jira/browse/SOLR-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481356#comment-13481356 ] Markus Klose commented on SOLR-3976: If it sounds like help me to index an html file I am sorry. I just tought that is a bug and should be posted here. Please close if necessary. We creadted a workaround with a sub entity like: dataConfig dataSource type=BinFileDataSource name=bin/ document entity name=f processor=FileListEntityProcessor recursive=true rootEntity=false dataSource=null baseDir=... fileName=.*.html onError=skip transformer=TemplateTransformer entity name=tika-test processor=TikaEntityProcessor url=${f.fileAbsolutePath} format=html dataSource=bin onError=skip transformer=TemplateTransformer,RegexTransformer,DateFormatTransformer,HTMLStripTransformer field column=id template=${f.file}/ field column=text name=text1/ entity name=tika2 processor=TikaEntityProcessor url=${f.fileAbsolutePath} format=html dataSource=bin onError=skip transformer=TemplateTransformer,HTMLStripTransformer field column=text name=text2 stripHTML=false/ /entity /entity /entity /document /dataConfig HTMLStripTransformer strips the tika field not the field to index - cannot have both (stripped and unstripped) - Key: SOLR-3976 URL: https://issues.apache.org/jira/browse/SOLR-3976 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 3.6 Reporter: Markus Klose Priority: Minor I run into the situation to index an html file using the dataimport handler and got an unexpected output. I wanted to create one field with the original content and one field with the same content but without html markup. If I enaple the HTMLStripTransformer at field text2 the other one (text1) is striped as well example configuraion: dataConfig dataSource type=BinFileDataSource name=bin/ document entity name=f processor=FileListEntityProcessor recursive=true rootEntity=false dataSource=null baseDir= fileName=.*.html onError=skip entity name=tika-test processor=TikaEntityProcessor url=${f.fileAbsolutePath} format=html dataSource=bin onError=skip transformer=HTMLStripTransformer,TemplateTransformer field column=id template=${f.file}/ field column=text name=text1 / field column=text name=text2 stripHTML=true/ /entity /entity /document /dataConfig -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-880) SolrCore should have a STOP option and a lazy startup option
[ https://issues.apache.org/jira/browse/SOLR-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson reassigned SOLR-880: --- Assignee: Erick Erickson (was: Shalin Shekhar Mangar) SolrCore should have a STOP option and a lazy startup option Key: SOLR-880 URL: https://issues.apache.org/jira/browse/SOLR-880 Project: Solr Issue Type: Improvement Components: multicore Reporter: Noble Paul Assignee: Erick Erickson * We must have an option to STOP and START a core. * a core should have an option of loadOnStartup=true|false. default should be true * A list command which can give the names of all cores and some meta information like status If there are too many cores (tens of thousands) where each of them may be used occassionally, we should not load all of them at once. In the runtime I should be able to STOP and START a core on demand. A listing command would let me know which one is present and what is up and what is down. A stopped core must not use any resource -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-3976) HTMLStripTransformer strips the tika field not the field to index - cannot have both (stripped and unstripped)
[ https://issues.apache.org/jira/browse/SOLR-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481356#comment-13481356 ] Markus Klose edited comment on SOLR-3976 at 10/22/12 1:35 PM: -- If it sounds like help me to index an html file I am sorry. I just tought that is a bug and should be posted here. Please close if necessary. We creadted a workaround with a sub entity like: dataConfig dataSource type=BinFileDataSource name=bin/ document entity name=f processor=FileListEntityProcessor recursive=true rootEntity=false dataSource=null baseDir=... fileName=.*.html onError=skip transformer=TemplateTransformer entity name=tika-test processor=TikaEntityProcessor url=${f.fileAbsolutePath} format=html dataSource=bin onError=skip transformer=TemplateTransformer,RegexTransformer,DateFormatTransformer,HTMLStripTransformer field column=id template=${f.file}/ field column=text name=text1/ entity name=tika2 processor=TikaEntityProcessor url=${f.fileAbsolutePath} format=html dataSource=bin onError=skip transformer=TemplateTransformer,HTMLStripTransformer field column=text name=text2 stripHTML=true/ /entity /entity /entity /document /dataConfig was (Author: markus-klose): If it sounds like help me to index an html file I am sorry. I just tought that is a bug and should be posted here. Please close if necessary. We creadted a workaround with a sub entity like: dataConfig dataSource type=BinFileDataSource name=bin/ document entity name=f processor=FileListEntityProcessor recursive=true rootEntity=false dataSource=null baseDir=... fileName=.*.html onError=skip transformer=TemplateTransformer entity name=tika-test processor=TikaEntityProcessor url=${f.fileAbsolutePath} format=html dataSource=bin onError=skip transformer=TemplateTransformer,RegexTransformer,DateFormatTransformer,HTMLStripTransformer field column=id template=${f.file}/ field column=text name=text1/ entity name=tika2 processor=TikaEntityProcessor url=${f.fileAbsolutePath} format=html dataSource=bin onError=skip transformer=TemplateTransformer,HTMLStripTransformer field column=text name=text2 stripHTML=false/ /entity /entity /entity /document /dataConfig HTMLStripTransformer strips the tika field not the field to index - cannot have both (stripped and unstripped) - Key: SOLR-3976 URL: https://issues.apache.org/jira/browse/SOLR-3976 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 3.6 Reporter: Markus Klose Priority: Minor I run into the situation to index an html file using the dataimport handler and got an unexpected output. I wanted to create one field with the original content and one field with the same content but without html markup. If I enaple the HTMLStripTransformer at field text2 the other one (text1) is striped as well example configuraion: dataConfig dataSource type=BinFileDataSource name=bin/ document entity name=f processor=FileListEntityProcessor recursive=true rootEntity=false dataSource=null baseDir= fileName=.*.html onError=skip entity name=tika-test processor=TikaEntityProcessor url=${f.fileAbsolutePath} format=html dataSource=bin onError=skip transformer=HTMLStripTransformer,TemplateTransformer field column=id template=${f.file}/ field column=text name=text1 / field column=text
[jira] [Assigned] (SOLR-1028) Automatic core loading unloading for multicore
[ https://issues.apache.org/jira/browse/SOLR-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson reassigned SOLR-1028: Assignee: Erick Erickson Automatic core loading unloading for multicore -- Key: SOLR-1028 URL: https://issues.apache.org/jira/browse/SOLR-1028 Project: Solr Issue Type: New Feature Components: multicore Reporter: Noble Paul Assignee: Erick Erickson Fix For: 4.1 usecase: I have many small cores (say one per user) on a single Solr box . All the cores are not be always needed . But when I need it I should be able to directly issue a search request and the core must be STARTED automatically and the request must be served. This also requires that I must have an upper limit on the no:of cores that should be loaded at any given point in time. If the limit is crossed the CoreContainer must unload a core (preferably the least recently used core) There must be a choice of specifying some cores as fixed. These cores must never be unloaded -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores
[ https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481368#comment-13481368 ] Jack Krupansky commented on SOLR-1293: -- bq. an RDBMS Is a full RDBMS needed? How about a NoSQL approach... like... um... Solr (or raw Lucene) itself? Support for large no:of cores and faster loading/unloading of cores --- Key: SOLR-1293 URL: https://issues.apache.org/jira/browse/SOLR-1293 Project: Solr Issue Type: New Feature Components: multicore Reporter: Noble Paul Fix For: 4.1 Attachments: SOLR-1293.patch Solr , currently ,is not very suitable for a large no:of homogeneous cores where you require fast/frequent loading/unloading of cores . usually a core is required to be loaded just to fire a search query or to just index one document The requirements of such a system are. * Very efficient loading of cores . Solr cannot afford to read and parse and create Schema, SolrConfig Objects for each core each time the core has to be loaded ( SOLR-919 , SOLR-920) * START STOP core . Currently it is only possible to unload a core (SOLR-880) * Automatic loading of cores . If a core is present and it is not loaded and a request comes for that load it automatically before serving up a request * As there are a large no:of cores , all the cores cannot be kept loaded always. There has to be an upper limit beyond which we need to unload a few cores (probably the least recently used ones) * Automatic allotment of dataDir for cores. If the no:of cores is too high al the cores' dataDirs cannot live in the same dir. There is an upper limit on the no:of dirs you can create in a unix dir w/o affecting performance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4497) Don't write posVIntCount in 4.1 codec
Robert Muir created LUCENE-4497: --- Summary: Don't write posVIntCount in 4.1 codec Key: LUCENE-4497 URL: https://issues.apache.org/jira/browse/LUCENE-4497 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Robert Muir Its confusing and unnecessary that we compute this from docFreq for the doc/freq vint count, but write it for the positions case: its totalTermFreq % BLOCK_SIZE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4497) Don't write posVIntCount in 4.1 codec
[ https://issues.apache.org/jira/browse/LUCENE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4497: Attachment: LUCENE-4497.patch Don't write posVIntCount in 4.1 codec - Key: LUCENE-4497 URL: https://issues.apache.org/jira/browse/LUCENE-4497 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Robert Muir Attachments: LUCENE-4497.patch Its confusing and unnecessary that we compute this from docFreq for the doc/freq vint count, but write it for the positions case: its totalTermFreq % BLOCK_SIZE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores
[ https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481370#comment-13481370 ] Erick Erickson commented on SOLR-1293: -- I don't care what's used to store the info. The provider that the user provides cares, but that's the point of getting that info through a custom component, Solr doesn't need to know. Nor should it G... Support for large no:of cores and faster loading/unloading of cores --- Key: SOLR-1293 URL: https://issues.apache.org/jira/browse/SOLR-1293 Project: Solr Issue Type: New Feature Components: multicore Reporter: Noble Paul Fix For: 4.1 Attachments: SOLR-1293.patch Solr , currently ,is not very suitable for a large no:of homogeneous cores where you require fast/frequent loading/unloading of cores . usually a core is required to be loaded just to fire a search query or to just index one document The requirements of such a system are. * Very efficient loading of cores . Solr cannot afford to read and parse and create Schema, SolrConfig Objects for each core each time the core has to be loaded ( SOLR-919 , SOLR-920) * START STOP core . Currently it is only possible to unload a core (SOLR-880) * Automatic loading of cores . If a core is present and it is not loaded and a request comes for that load it automatically before serving up a request * As there are a large no:of cores , all the cores cannot be kept loaded always. There has to be an upper limit beyond which we need to unload a few cores (probably the least recently used ones) * Automatic allotment of dataDir for cores. If the no:of cores is too high al the cores' dataDirs cannot live in the same dir. There is an upper limit on the no:of dirs you can create in a unix dir w/o affecting performance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores
[ https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481372#comment-13481372 ] Noble Paul commented on SOLR-1293: -- Rdbms is not required. We ate managing that with the xml itself. Now that we have moved to zookeeper for cloud, we should piggyback on zookeeper for everything Support for large no:of cores and faster loading/unloading of cores --- Key: SOLR-1293 URL: https://issues.apache.org/jira/browse/SOLR-1293 Project: Solr Issue Type: New Feature Components: multicore Reporter: Noble Paul Fix For: 4.1 Attachments: SOLR-1293.patch Solr , currently ,is not very suitable for a large no:of homogeneous cores where you require fast/frequent loading/unloading of cores . usually a core is required to be loaded just to fire a search query or to just index one document The requirements of such a system are. * Very efficient loading of cores . Solr cannot afford to read and parse and create Schema, SolrConfig Objects for each core each time the core has to be loaded ( SOLR-919 , SOLR-920) * START STOP core . Currently it is only possible to unload a core (SOLR-880) * Automatic loading of cores . If a core is present and it is not loaded and a request comes for that load it automatically before serving up a request * As there are a large no:of cores , all the cores cannot be kept loaded always. There has to be an upper limit beyond which we need to unload a few cores (probably the least recently used ones) * Automatic allotment of dataDir for cores. If the no:of cores is too high al the cores' dataDirs cannot live in the same dir. There is an upper limit on the no:of dirs you can create in a unix dir w/o affecting performance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores
[ https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481372#comment-13481372 ] Noble Paul edited comment on SOLR-1293 at 10/22/12 2:03 PM: Rdbms is not required. We are managing that with the xml itself. Now that we have moved to zookeeper for cloud, we should piggyback on zookeeper for everything was (Author: noble.paul): Rdbms is not required. We ate managing that with the xml itself. Now that we have moved to zookeeper for cloud, we should piggyback on zookeeper for everything Support for large no:of cores and faster loading/unloading of cores --- Key: SOLR-1293 URL: https://issues.apache.org/jira/browse/SOLR-1293 Project: Solr Issue Type: New Feature Components: multicore Reporter: Noble Paul Fix For: 4.1 Attachments: SOLR-1293.patch Solr , currently ,is not very suitable for a large no:of homogeneous cores where you require fast/frequent loading/unloading of cores . usually a core is required to be loaded just to fire a search query or to just index one document The requirements of such a system are. * Very efficient loading of cores . Solr cannot afford to read and parse and create Schema, SolrConfig Objects for each core each time the core has to be loaded ( SOLR-919 , SOLR-920) * START STOP core . Currently it is only possible to unload a core (SOLR-880) * Automatic loading of cores . If a core is present and it is not loaded and a request comes for that load it automatically before serving up a request * As there are a large no:of cores , all the cores cannot be kept loaded always. There has to be an upper limit beyond which we need to unload a few cores (probably the least recently used ones) * Automatic allotment of dataDir for cores. If the no:of cores is too high al the cores' dataDirs cannot live in the same dir. There is an upper limit on the no:of dirs you can create in a unix dir w/o affecting performance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores
[ https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481375#comment-13481375 ] Jack Krupansky commented on SOLR-1293: -- bq. Solr doesn't need to know True, but what store would you propose using in unit tests? I suppose you could develop a Mock RDBMS which could be even simpler than Solr so unit tests don't need a solr running. Support for large no:of cores and faster loading/unloading of cores --- Key: SOLR-1293 URL: https://issues.apache.org/jira/browse/SOLR-1293 Project: Solr Issue Type: New Feature Components: multicore Reporter: Noble Paul Fix For: 4.1 Attachments: SOLR-1293.patch Solr , currently ,is not very suitable for a large no:of homogeneous cores where you require fast/frequent loading/unloading of cores . usually a core is required to be loaded just to fire a search query or to just index one document The requirements of such a system are. * Very efficient loading of cores . Solr cannot afford to read and parse and create Schema, SolrConfig Objects for each core each time the core has to be loaded ( SOLR-919 , SOLR-920) * START STOP core . Currently it is only possible to unload a core (SOLR-880) * Automatic loading of cores . If a core is present and it is not loaded and a request comes for that load it automatically before serving up a request * As there are a large no:of cores , all the cores cannot be kept loaded always. There has to be an upper limit beyond which we need to unload a few cores (probably the least recently used ones) * Automatic allotment of dataDir for cores. If the no:of cores is too high al the cores' dataDirs cannot live in the same dir. There is an upper limit on the no:of dirs you can create in a unix dir w/o affecting performance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec
Robert Muir created LUCENE-4498: --- Summary: pulse docfreq=1 DOCS_ONLY for 4.1 codec Key: LUCENE-4498 URL: https://issues.apache.org/jira/browse/LUCENE-4498 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Robert Muir We have pulsing codec, but currently this has some downsides: * its very general, wrapping an arbitrary postingsformat and pulsing everything in the postings for an arbitrary docfreq/totalTermFreq cutoff * reuse is hairy: because it specializes its enums based on these cutoffs, when walking thru terms e.g. merging there is a lot of sophisticated stuff to avoid the worst cases where we clone indexinputs for tons of terms. On the other hand the way the 4.1 codec encodes primary key fields is pretty silly, we write the docStartFP vlong in the term dictionary metadata, which tells us where to seek in the .doc to read our one lonely vint. I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just write the lone doc delta where we would write docStartFP. We can avoid the hairy reuse problem too, by just supporting this in refillDocs() in BlockDocsEnum instead of specializing. This would remove the additional seek for primary key fields without really any of the downsides of pulsing today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores
[ https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481380#comment-13481380 ] Noble Paul commented on SOLR-1293: -- If you wish to test the zk persistence feature should we just not use an embedded zk? Support for large no:of cores and faster loading/unloading of cores --- Key: SOLR-1293 URL: https://issues.apache.org/jira/browse/SOLR-1293 Project: Solr Issue Type: New Feature Components: multicore Reporter: Noble Paul Fix For: 4.1 Attachments: SOLR-1293.patch Solr , currently ,is not very suitable for a large no:of homogeneous cores where you require fast/frequent loading/unloading of cores . usually a core is required to be loaded just to fire a search query or to just index one document The requirements of such a system are. * Very efficient loading of cores . Solr cannot afford to read and parse and create Schema, SolrConfig Objects for each core each time the core has to be loaded ( SOLR-919 , SOLR-920) * START STOP core . Currently it is only possible to unload a core (SOLR-880) * Automatic loading of cores . If a core is present and it is not loaded and a request comes for that load it automatically before serving up a request * As there are a large no:of cores , all the cores cannot be kept loaded always. There has to be an upper limit beyond which we need to unload a few cores (probably the least recently used ones) * Automatic allotment of dataDir for cores. If the no:of cores is too high al the cores' dataDirs cannot live in the same dir. There is an upper limit on the no:of dirs you can create in a unix dir w/o affecting performance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores
[ https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481386#comment-13481386 ] Jack Krupansky commented on SOLR-1293: -- bq. piggyback on zookeeper That's okay, but zk is optimized for a small amount of configuration info - 1 MB limit. Is large number times data per core going to be under 1 MB? Is large number supposed to be hundreds, thousands, tens of thousands, hundreds of thousands, millions, ...? I mean, if a web site had millions of users, could they have one loadable core per user? The use case should be more specific about the goals. Support for large no:of cores and faster loading/unloading of cores --- Key: SOLR-1293 URL: https://issues.apache.org/jira/browse/SOLR-1293 Project: Solr Issue Type: New Feature Components: multicore Reporter: Noble Paul Fix For: 4.1 Attachments: SOLR-1293.patch Solr , currently ,is not very suitable for a large no:of homogeneous cores where you require fast/frequent loading/unloading of cores . usually a core is required to be loaded just to fire a search query or to just index one document The requirements of such a system are. * Very efficient loading of cores . Solr cannot afford to read and parse and create Schema, SolrConfig Objects for each core each time the core has to be loaded ( SOLR-919 , SOLR-920) * START STOP core . Currently it is only possible to unload a core (SOLR-880) * Automatic loading of cores . If a core is present and it is not loaded and a request comes for that load it automatically before serving up a request * As there are a large no:of cores , all the cores cannot be kept loaded always. There has to be an upper limit beyond which we need to unload a few cores (probably the least recently used ones) * Automatic allotment of dataDir for cores. If the no:of cores is too high al the cores' dataDirs cannot live in the same dir. There is an upper limit on the no:of dirs you can create in a unix dir w/o affecting performance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Example schema doc omission - omitPositions, omitTermFreqAndPositions, sortMissingFirst, and sortMissingLast
The Solr example schema says “!-- Valid attributes for fields:”, but omits omitPositions, omitTermFreqAndPositions, sortMissingFirst, and sortMissingLast. It would also be helpful to have a clarifying note that distinguishes omitPositions and omitTermFreqAndPositions from termPositions and termVectors. I’m not positive, but is it simply that the omitXxx attributes control what gets indexed versus the termXxx attributes controlling what is what can be retrieved, and that settings of the latter do not influence the former? -- Jack Krupansky
[jira] [Commented] (LUCENE-4006) system requirements is duplicated across versioned/unversioned
[ https://issues.apache.org/jira/browse/LUCENE-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481407#comment-13481407 ] Uwe Schindler commented on LUCENE-4006: --- I committed the changes to the already published versioned 4.0 website (after communication with th RM Robert Muir). I will later remove the global docs and only refer to the per-version docs. 3.6.1 versioned forrest docs already contained the system requirements, so those don't need to be changed. system requirements is duplicated across versioned/unversioned -- Key: LUCENE-4006 URL: https://issues.apache.org/jira/browse/LUCENE-4006 Project: Lucene - Core Issue Type: Task Components: general/javadocs Reporter: Robert Muir Assignee: Uwe Schindler Fix For: 4.1, 5.0, 4.0.1 Attachments: LUCENE-4006.patch Our System requirements page is located here on the unversioned site: http://lucene.apache.org/core/systemreqs.html But its also in forrest under each release. Can we just nuke the forrested one? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores
[ https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481413#comment-13481413 ] Andrzej Rusin commented on SOLR-1293: - Whatever would be the storage of the cores info, it would be nice to have some API and/or command line tools for (batch) manipulating the cores; what do you think? Support for large no:of cores and faster loading/unloading of cores --- Key: SOLR-1293 URL: https://issues.apache.org/jira/browse/SOLR-1293 Project: Solr Issue Type: New Feature Components: multicore Reporter: Noble Paul Fix For: 4.1 Attachments: SOLR-1293.patch Solr , currently ,is not very suitable for a large no:of homogeneous cores where you require fast/frequent loading/unloading of cores . usually a core is required to be loaded just to fire a search query or to just index one document The requirements of such a system are. * Very efficient loading of cores . Solr cannot afford to read and parse and create Schema, SolrConfig Objects for each core each time the core has to be loaded ( SOLR-919 , SOLR-920) * START STOP core . Currently it is only possible to unload a core (SOLR-880) * Automatic loading of cores . If a core is present and it is not loaded and a request comes for that load it automatically before serving up a request * As there are a large no:of cores , all the cores cannot be kept loaded always. There has to be an upper limit beyond which we need to unload a few cores (probably the least recently used ones) * Automatic allotment of dataDir for cores. If the no:of cores is too high al the cores' dataDirs cannot live in the same dir. There is an upper limit on the no:of dirs you can create in a unix dir w/o affecting performance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4499) Multi-word synonym filter (synonym expansion at indexing time).
roman created LUCENE-4499: - Summary: Multi-word synonym filter (synonym expansion at indexing time). Key: LUCENE-4499 URL: https://issues.apache.org/jira/browse/LUCENE-4499 Project: Lucene - Core Issue Type: Improvement Components: core/other Affects Versions: 4.1, 5.0 Reporter: roman Priority: Minor Fix For: 5.0 I apologize for bringing the multi-token synonym expansion up again. There is an old, unresolved issue at LUCENE-1622 [1] While solving the problem for our needs [2], I discovered that the current SolrSynonym parser (and the wonderful FTS) have almost everything to satisfactorily handle both the query and index time synonym expansion. It seems that people often need to use the synonym filter *slightly* differently at indexing and query time. In our case, we must do different things during indexing and querying. Example sentence: Mirrors of the Hubble space telescope pointed at XA5 This is what we need (comma marks position bump): indexing: mirrors,hubble|hubble space telescope|hst,space,telescope,pointed,xa5|astroobject#5 querying: +mirrors +(hubble space telescope | hst) +pointed +(xa5|astroboject#5) This translated to following needs: indexing time: single-token synonyms = return only synonyms multi-token synonyms = return original tokens AND the synonyms We need the original tokens for the proximity queries, if we indexed 'hubble space telescope' as one token, we cannot search for 'hubble NEAR telescope' query time: single-token: return only its synonyms (but preserve case) multi-token: return only synonyms You may (not) be surprised, but Lucene already supports ALL these requirements. The patch is an attempt to state the problem differently. I am not sure if it is the best option, however it works perfectly for our needs and it seems it could work for general public too. Especially if the SynonymFilterFactory had a preconfigured sets of SynonymMapBuilders - and people could just choose what situation they use. links: [1] https://issues.apache.org/jira/browse/LUCENE-1622 [2] http://labs.adsabs.harvard.edu/trac/ads-invenio/ticket/158 [3] seems to have similar request: http://lucene.472066.n3.nabble.com/Proposal-Full-support-for-multi-word-synonyms-at-query-time-td4000522.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4499) Multi-word synonym filter (synonym expansion)
[ https://issues.apache.org/jira/browse/LUCENE-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] roman updated LUCENE-4499: -- Summary: Multi-word synonym filter (synonym expansion) (was: Multi-word synonym filter (synonym expansion at indexing time).) Multi-word synonym filter (synonym expansion) - Key: LUCENE-4499 URL: https://issues.apache.org/jira/browse/LUCENE-4499 Project: Lucene - Core Issue Type: Improvement Components: core/other Affects Versions: 4.1, 5.0 Reporter: roman Priority: Minor Labels: analysis, multi-word, synonyms Fix For: 5.0 I apologize for bringing the multi-token synonym expansion up again. There is an old, unresolved issue at LUCENE-1622 [1] While solving the problem for our needs [2], I discovered that the current SolrSynonym parser (and the wonderful FTS) have almost everything to satisfactorily handle both the query and index time synonym expansion. It seems that people often need to use the synonym filter *slightly* differently at indexing and query time. In our case, we must do different things during indexing and querying. Example sentence: Mirrors of the Hubble space telescope pointed at XA5 This is what we need (comma marks position bump): indexing: mirrors,hubble|hubble space telescope|hst,space,telescope,pointed,xa5|astroobject#5 querying: +mirrors +(hubble space telescope | hst) +pointed +(xa5|astroboject#5) This translated to following needs: indexing time: single-token synonyms = return only synonyms multi-token synonyms = return original tokens AND the synonyms We need the original tokens for the proximity queries, if we indexed 'hubble space telescope' as one token, we cannot search for 'hubble NEAR telescope' query time: single-token: return only its synonyms (but preserve case) multi-token: return only synonyms You may (not) be surprised, but Lucene already supports ALL these requirements. The patch is an attempt to state the problem differently. I am not sure if it is the best option, however it works perfectly for our needs and it seems it could work for general public too. Especially if the SynonymFilterFactory had a preconfigured sets of SynonymMapBuilders - and people could just choose what situation they use. links: [1] https://issues.apache.org/jira/browse/LUCENE-1622 [2] http://labs.adsabs.harvard.edu/trac/ads-invenio/ticket/158 [3] seems to have similar request: http://lucene.472066.n3.nabble.com/Proposal-Full-support-for-multi-word-synonyms-at-query-time-td4000522.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4496) Don't decode unnecessary freq blocks in 4.1 codec
[ https://issues.apache.org/jira/browse/LUCENE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4496: Attachment: LUCENE-4496.patch Same patch, adding a few comments and beefing up TestBlockPostingsFormat3 to also check the freqs case. I'll commit this shortly after running some more tests, and I think I want to now yank TestBlockPostingsFormat3 out of this package and let it run with any codec, it just tests these various subset cases and isnt specific to this PF. Don't decode unnecessary freq blocks in 4.1 codec - Key: LUCENE-4496 URL: https://issues.apache.org/jira/browse/LUCENE-4496 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Affects Versions: 4.1 Reporter: Robert Muir Attachments: LUCENE-4496.patch, LUCENE-4496.patch, LUCENE-4496.patch TermsEnum.docs() has an expert flag to specify you don't require frequencies. This is currently set by some things that don't need it: we should call ForUtil.skipBlock instead of ForUtil.readBlock in this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores
[ https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481428#comment-13481428 ] Erick Erickson commented on SOLR-1293: -- Well, I don't think the use-case I'm working on needs an API or command-line tools, so I probably won't be working on it. I'd be glad to commit it in if someone else wanted to do it. Support for large no:of cores and faster loading/unloading of cores --- Key: SOLR-1293 URL: https://issues.apache.org/jira/browse/SOLR-1293 Project: Solr Issue Type: New Feature Components: multicore Reporter: Noble Paul Fix For: 4.1 Attachments: SOLR-1293.patch Solr , currently ,is not very suitable for a large no:of homogeneous cores where you require fast/frequent loading/unloading of cores . usually a core is required to be loaded just to fire a search query or to just index one document The requirements of such a system are. * Very efficient loading of cores . Solr cannot afford to read and parse and create Schema, SolrConfig Objects for each core each time the core has to be loaded ( SOLR-919 , SOLR-920) * START STOP core . Currently it is only possible to unload a core (SOLR-880) * Automatic loading of cores . If a core is present and it is not loaded and a request comes for that load it automatically before serving up a request * As there are a large no:of cores , all the cores cannot be kept loaded always. There has to be an upper limit beyond which we need to unload a few cores (probably the least recently used ones) * Automatic allotment of dataDir for cores. If the no:of cores is too high al the cores' dataDirs cannot live in the same dir. There is an upper limit on the no:of dirs you can create in a unix dir w/o affecting performance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4499) Multi-word synonym filter (synonym expansion)
[ https://issues.apache.org/jira/browse/LUCENE-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] roman updated LUCENE-4499: -- Description: I apologize for bringing the multi-token synonym expansion up again. There is an old, unresolved issue at LUCENE-1622 [1] While solving the problem for our needs [2], I discovered that the current SolrSynonym parser (and the wonderful FTS) have almost everything to satisfactorily handle both the query and index time synonym expansion. It seems that people often need to use the synonym filter *slightly* differently at indexing and query time. In our case, we must do different things during indexing and querying. Example sentence: Mirrors of the Hubble space telescope pointed at XA5 This is what we need (comma marks position bump): indexing: mirrors,hubble|hubble space telescope|hst,space,telescope,pointed,xa5|astroobject#5 querying: +mirrors +(hubble space telescope | hst) +pointed +(xa5|astroboject#5) This translated to following needs: indexing time: single-token synonyms = return only synonyms multi-token synonyms = return original tokens *AND* the synonyms query time: single-token: return only synonyms (but preserve case) multi-token: return only synonyms We need the original tokens for the proximity queries, if we indexed 'hubble space telescope' as one token, we cannot search for 'hubble NEAR telescope' You may (not) be surprised, but Lucene already supports ALL of these requirements. The patch is an attempt to state the problem differently. I am not sure if it is the best option, however it works perfectly for our needs and it seems it could work for general public too. Especially if the SynonymFilterFactory had a preconfigured sets of SynonymMapBuilders - and people would just choose what situation they use. Please look at the unittest. links: [1] https://issues.apache.org/jira/browse/LUCENE-1622 [2] http://labs.adsabs.harvard.edu/trac/ads-invenio/ticket/158 [3] seems to have similar request: http://lucene.472066.n3.nabble.com/Proposal-Full-support-for-multi-word-synonyms-at-query-time-td4000522.html was: I apologize for bringing the multi-token synonym expansion up again. There is an old, unresolved issue at LUCENE-1622 [1] While solving the problem for our needs [2], I discovered that the current SolrSynonym parser (and the wonderful FTS) have almost everything to satisfactorily handle both the query and index time synonym expansion. It seems that people often need to use the synonym filter *slightly* differently at indexing and query time. In our case, we must do different things during indexing and querying. Example sentence: Mirrors of the Hubble space telescope pointed at XA5 This is what we need (comma marks position bump): indexing: mirrors,hubble|hubble space telescope|hst,space,telescope,pointed,xa5|astroobject#5 querying: +mirrors +(hubble space telescope | hst) +pointed +(xa5|astroboject#5) This translated to following needs: indexing time: single-token synonyms = return only synonyms multi-token synonyms = return original tokens AND the synonyms We need the original tokens for the proximity queries, if we indexed 'hubble space telescope' as one token, we cannot search for 'hubble NEAR telescope' query time: single-token: return only its synonyms (but preserve case) multi-token: return only synonyms You may (not) be surprised, but Lucene already supports ALL these requirements. The patch is an attempt to state the problem differently. I am not sure if it is the best option, however it works perfectly for our needs and it seems it could work for general public too. Especially if the SynonymFilterFactory had a preconfigured sets of SynonymMapBuilders - and people could just choose what situation they use. links: [1] https://issues.apache.org/jira/browse/LUCENE-1622 [2] http://labs.adsabs.harvard.edu/trac/ads-invenio/ticket/158 [3] seems to have similar request: http://lucene.472066.n3.nabble.com/Proposal-Full-support-for-multi-word-synonyms-at-query-time-td4000522.html Multi-word synonym filter (synonym expansion) - Key: LUCENE-4499 URL: https://issues.apache.org/jira/browse/LUCENE-4499 Project: Lucene - Core Issue Type: Improvement Components: core/other Affects Versions: 4.1, 5.0 Reporter: roman Priority: Minor Labels: analysis, multi-word, synonyms Fix For: 5.0 I apologize for bringing the multi-token synonym expansion up again. There is an old, unresolved issue at LUCENE-1622 [1] While solving the problem for our needs [2], I discovered that the current SolrSynonym parser (and the wonderful FTS) have almost everything to satisfactorily handle both the query and index time synonym
[jira] [Updated] (LUCENE-4499) Multi-word synonym filter (synonym expansion)
[ https://issues.apache.org/jira/browse/LUCENE-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] roman updated LUCENE-4499: -- Attachment: LUCENE-4499.patch patch against latest trunk, i am seeing some unrelated unittests failing Multi-word synonym filter (synonym expansion) - Key: LUCENE-4499 URL: https://issues.apache.org/jira/browse/LUCENE-4499 Project: Lucene - Core Issue Type: Improvement Components: core/other Affects Versions: 4.1, 5.0 Reporter: roman Priority: Minor Labels: analysis, multi-word, synonyms Fix For: 5.0 Attachments: LUCENE-4499.patch I apologize for bringing the multi-token synonym expansion up again. There is an old, unresolved issue at LUCENE-1622 [1] While solving the problem for our needs [2], I discovered that the current SolrSynonym parser (and the wonderful FTS) have almost everything to satisfactorily handle both the query and index time synonym expansion. It seems that people often need to use the synonym filter *slightly* differently at indexing and query time. In our case, we must do different things during indexing and querying. Example sentence: Mirrors of the Hubble space telescope pointed at XA5 This is what we need (comma marks position bump): indexing: mirrors,hubble|hubble space telescope|hst,space,telescope,pointed,xa5|astroobject#5 querying: +mirrors +(hubble space telescope | hst) +pointed +(xa5|astroboject#5) This translated to following needs: indexing time: single-token synonyms = return only synonyms multi-token synonyms = return original tokens *AND* the synonyms query time: single-token: return only synonyms (but preserve case) multi-token: return only synonyms We need the original tokens for the proximity queries, if we indexed 'hubble space telescope' as one token, we cannot search for 'hubble NEAR telescope' You may (not) be surprised, but Lucene already supports ALL of these requirements. The patch is an attempt to state the problem differently. I am not sure if it is the best option, however it works perfectly for our needs and it seems it could work for general public too. Especially if the SynonymFilterFactory had a preconfigured sets of SynonymMapBuilders - and people would just choose what situation they use. Please look at the unittest. links: [1] https://issues.apache.org/jira/browse/LUCENE-1622 [2] http://labs.adsabs.harvard.edu/trac/ads-invenio/ticket/158 [3] seems to have similar request: http://lucene.472066.n3.nabble.com/Proposal-Full-support-for-multi-word-synonyms-at-query-time-td4000522.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4496) Don't decode unnecessary freq blocks in 4.1 codec
[ https://issues.apache.org/jira/browse/LUCENE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481437#comment-13481437 ] Robert Muir commented on LUCENE-4496: - I committed to trunk... will give it some time in jenkins before backporting. Don't decode unnecessary freq blocks in 4.1 codec - Key: LUCENE-4496 URL: https://issues.apache.org/jira/browse/LUCENE-4496 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Affects Versions: 4.1 Reporter: Robert Muir Attachments: LUCENE-4496.patch, LUCENE-4496.patch, LUCENE-4496.patch TermsEnum.docs() has an expert flag to specify you don't require frequencies. This is currently set by some things that don't need it: we should call ForUtil.skipBlock instead of ForUtil.readBlock in this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4497) Don't write posVIntCount in 4.1 codec
[ https://issues.apache.org/jira/browse/LUCENE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4497: Attachment: LUCENE-4497.patch updated patch to trunk. This is actually a nice little savings to the positions file with the luceneutil 1M collection. trunk: 116425749 bytes patch: 111340216 bytes Don't write posVIntCount in 4.1 codec - Key: LUCENE-4497 URL: https://issues.apache.org/jira/browse/LUCENE-4497 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Robert Muir Attachments: LUCENE-4497.patch, LUCENE-4497.patch Its confusing and unnecessary that we compute this from docFreq for the doc/freq vint count, but write it for the positions case: its totalTermFreq % BLOCK_SIZE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec
[ https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481480#comment-13481480 ] Robert Muir commented on LUCENE-4498: - I will work on a patch after LUCENE-4497 has been reviewed... ive already conflicted myself with this PF today :) pulse docfreq=1 DOCS_ONLY for 4.1 codec --- Key: LUCENE-4498 URL: https://issues.apache.org/jira/browse/LUCENE-4498 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Robert Muir We have pulsing codec, but currently this has some downsides: * its very general, wrapping an arbitrary postingsformat and pulsing everything in the postings for an arbitrary docfreq/totalTermFreq cutoff * reuse is hairy: because it specializes its enums based on these cutoffs, when walking thru terms e.g. merging there is a lot of sophisticated stuff to avoid the worst cases where we clone indexinputs for tons of terms. On the other hand the way the 4.1 codec encodes primary key fields is pretty silly, we write the docStartFP vlong in the term dictionary metadata, which tells us where to seek in the .doc to read our one lonely vint. I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just write the lone doc delta where we would write docStartFP. We can avoid the hairy reuse problem too, by just supporting this in refillDocs() in BlockDocsEnum instead of specializing. This would remove the additional seek for primary key fields without really any of the downsides of pulsing today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4497) Don't write posVIntCount in 4.1 codec
[ https://issues.apache.org/jira/browse/LUCENE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481488#comment-13481488 ] Michael McCandless commented on LUCENE-4497: +1, nice! Don't write posVIntCount in 4.1 codec - Key: LUCENE-4497 URL: https://issues.apache.org/jira/browse/LUCENE-4497 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Robert Muir Attachments: LUCENE-4497.patch, LUCENE-4497.patch Its confusing and unnecessary that we compute this from docFreq for the doc/freq vint count, but write it for the positions case: its totalTermFreq % BLOCK_SIZE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec
[ https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481514#comment-13481514 ] Michael McCandless commented on LUCENE-4498: +1 pulse docfreq=1 DOCS_ONLY for 4.1 codec --- Key: LUCENE-4498 URL: https://issues.apache.org/jira/browse/LUCENE-4498 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Robert Muir We have pulsing codec, but currently this has some downsides: * its very general, wrapping an arbitrary postingsformat and pulsing everything in the postings for an arbitrary docfreq/totalTermFreq cutoff * reuse is hairy: because it specializes its enums based on these cutoffs, when walking thru terms e.g. merging there is a lot of sophisticated stuff to avoid the worst cases where we clone indexinputs for tons of terms. On the other hand the way the 4.1 codec encodes primary key fields is pretty silly, we write the docStartFP vlong in the term dictionary metadata, which tells us where to seek in the .doc to read our one lonely vint. I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just write the lone doc delta where we would write docStartFP. We can avoid the hairy reuse problem too, by just supporting this in refillDocs() in BlockDocsEnum instead of specializing. This would remove the additional seek for primary key fields without really any of the downsides of pulsing today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores
[ https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481536#comment-13481536 ] Noble Paul commented on SOLR-1293: -- bq. Is large number supposed to be hundreds, thousands, tens of thousands, hundreds of thousands, millions, ...? I'll be surprised if it ever crosses a few 1's . But let us say the upper limit sa a 10 , shouldn't it be simple to keep in ZK? Support for large no:of cores and faster loading/unloading of cores --- Key: SOLR-1293 URL: https://issues.apache.org/jira/browse/SOLR-1293 Project: Solr Issue Type: New Feature Components: multicore Reporter: Noble Paul Fix For: 4.1 Attachments: SOLR-1293.patch Solr , currently ,is not very suitable for a large no:of homogeneous cores where you require fast/frequent loading/unloading of cores . usually a core is required to be loaded just to fire a search query or to just index one document The requirements of such a system are. * Very efficient loading of cores . Solr cannot afford to read and parse and create Schema, SolrConfig Objects for each core each time the core has to be loaded ( SOLR-919 , SOLR-920) * START STOP core . Currently it is only possible to unload a core (SOLR-880) * Automatic loading of cores . If a core is present and it is not loaded and a request comes for that load it automatically before serving up a request * As there are a large no:of cores , all the cores cannot be kept loaded always. There has to be an upper limit beyond which we need to unload a few cores (probably the least recently used ones) * Automatic allotment of dataDir for cores. If the no:of cores is too high al the cores' dataDirs cannot live in the same dir. There is an upper limit on the no:of dirs you can create in a unix dir w/o affecting performance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4497) Don't write posVIntCount in 4.1 codec
[ https://issues.apache.org/jira/browse/LUCENE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481551#comment-13481551 ] Robert Muir commented on LUCENE-4497: - I committed to trunk. will bake for a bit before backporting. Don't write posVIntCount in 4.1 codec - Key: LUCENE-4497 URL: https://issues.apache.org/jira/browse/LUCENE-4497 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Robert Muir Attachments: LUCENE-4497.patch, LUCENE-4497.patch Its confusing and unnecessary that we compute this from docFreq for the doc/freq vint count, but write it for the positions case: its totalTermFreq % BLOCK_SIZE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec
[ https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481553#comment-13481553 ] Robert Muir commented on LUCENE-4498: - Actually I think for the other cases (not just DOCS_ONLY) we can pulse when totalTermFreq=1, as the freq is implicit. We can just leave the positions and what not where they are. I'll see how ugly it is... pulse docfreq=1 DOCS_ONLY for 4.1 codec --- Key: LUCENE-4498 URL: https://issues.apache.org/jira/browse/LUCENE-4498 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Robert Muir We have pulsing codec, but currently this has some downsides: * its very general, wrapping an arbitrary postingsformat and pulsing everything in the postings for an arbitrary docfreq/totalTermFreq cutoff * reuse is hairy: because it specializes its enums based on these cutoffs, when walking thru terms e.g. merging there is a lot of sophisticated stuff to avoid the worst cases where we clone indexinputs for tons of terms. On the other hand the way the 4.1 codec encodes primary key fields is pretty silly, we write the docStartFP vlong in the term dictionary metadata, which tells us where to seek in the .doc to read our one lonely vint. I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just write the lone doc delta where we would write docStartFP. We can avoid the hairy reuse problem too, by just supporting this in refillDocs() in BlockDocsEnum instead of specializing. This would remove the additional seek for primary key fields without really any of the downsides of pulsing today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
heads up: reindex trunk indexes
I committed https://issues.apache.org/jira/browse/LUCENE-4497. You should reindex - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3977) Add [* TO *] option to spatial fields.
David Smiley created SOLR-3977: -- Summary: Add [* TO *] option to spatial fields. Key: SOLR-3977 URL: https://issues.apache.org/jira/browse/SOLR-3977 Project: Solr Issue Type: New Feature Reporter: David Smiley Priority: Minor It would be nice to have [* TO *] work on a spatial field. Not necessarily any range query but this specific one. I don't know if there are other non-spatial fields where this won't work, but it'd be nice if this was universal. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2305) DataImportScheduler
[ https://issues.apache.org/jira/browse/SOLR-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481621#comment-13481621 ] Marko Bonaci commented on SOLR-2305: [~otis] Got it! Will do... DataImportScheduler --- Key: SOLR-2305 URL: https://issues.apache.org/jira/browse/SOLR-2305 Project: Solr Issue Type: New Feature Affects Versions: 4.0-ALPHA Reporter: Bill Bell Fix For: 4.1 Attachments: patch.txt, SOLR-2305-1.diff Marko Bonaci has updated the WIKI page to add the DataImportScheduler, but I cannot find a JIRA ticket for it? http://wiki.apache.org/solr/DataImportHandler Do we have a ticket so the code can be tracked? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated SOLR-1972: Attachment: SOLR-1972_metrics.patch Need additional query stats in admin interface - median, 95th and 99th percentile - Key: SOLR-1972 URL: https://issues.apache.org/jira/browse/SOLR-1972 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Shawn Heisey Priority: Minor Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972-url_pattern.patch I would like to see more detailed query statistics from the admin GUI. This is what you can get now: requests : 809 errors : 0 timeouts : 0 totalTime : 70053 avgTimePerRequest : 86.59209 avgRequestsPerSecond : 0.8148785 I'd like to see more data on the time per request - median, 95th percentile, 99th percentile, and any other statistical function that makes sense to include. In my environment, the first bunch of queries after startup tend to take several seconds each. I find that the average value tends to be useless until it has several thousand queries under its belt and the caches are thoroughly warmed. The statistical functions I have mentioned would quickly eliminate the influence of those initial slow queries. The system will have to store individual data about each query. I don't know if this is something Solr does already. It would be nice to have a configurable count of how many of the most recent data points are kept, to control the amount of memory the feature uses. The default value could be something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec
[ https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4498: Attachment: LUCENE-4498.patch Initial patch (no file format docs yet, lets benchmark/measure first). All tests pass. pulse docfreq=1 DOCS_ONLY for 4.1 codec --- Key: LUCENE-4498 URL: https://issues.apache.org/jira/browse/LUCENE-4498 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Robert Muir Attachments: LUCENE-4498.patch We have pulsing codec, but currently this has some downsides: * its very general, wrapping an arbitrary postingsformat and pulsing everything in the postings for an arbitrary docfreq/totalTermFreq cutoff * reuse is hairy: because it specializes its enums based on these cutoffs, when walking thru terms e.g. merging there is a lot of sophisticated stuff to avoid the worst cases where we clone indexinputs for tons of terms. On the other hand the way the 4.1 codec encodes primary key fields is pretty silly, we write the docStartFP vlong in the term dictionary metadata, which tells us where to seek in the .doc to read our one lonely vint. I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just write the lone doc delta where we would write docStartFP. We can avoid the hairy reuse problem too, by just supporting this in refillDocs() in BlockDocsEnum instead of specializing. This would remove the additional seek for primary key fields without really any of the downsides of pulsing today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481655#comment-13481655 ] Alan Woodward commented on SOLR-1972: - Here's a patch that uses the metrics library. It doesn't include Eric's regex matching or anything at the moment - it basically just takes what's currently in trunk, refactors it to use metrics' Counter and Timer objects, and adds the rolling average data. Cons: - it adds another dependency to solr-core. It's a useful dependency, IMO, but still. - tests don't pass at the moment, as metrics spawns extra threads which the test runner doesn't know how to deal with Pros: - it's a purpose-designed stats and metrics library, so we don't need to worry about the maths or sampling algorithms - it adds the functionality of the original ticket/patch in a much simpler way. The ideal solution would be a component of some kind, I think, but this at least improves on what's in trunk at the moment. Need additional query stats in admin interface - median, 95th and 99th percentile - Key: SOLR-1972 URL: https://issues.apache.org/jira/browse/SOLR-1972 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Shawn Heisey Priority: Minor Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972-url_pattern.patch I would like to see more detailed query statistics from the admin GUI. This is what you can get now: requests : 809 errors : 0 timeouts : 0 totalTime : 70053 avgTimePerRequest : 86.59209 avgRequestsPerSecond : 0.8148785 I'd like to see more data on the time per request - median, 95th percentile, 99th percentile, and any other statistical function that makes sense to include. In my environment, the first bunch of queries after startup tend to take several seconds each. I find that the average value tends to be useless until it has several thousand queries under its belt and the caches are thoroughly warmed. The statistical functions I have mentioned would quickly eliminate the influence of those initial slow queries. The system will have to store individual data about each query. I don't know if this is something Solr does already. It would be nice to have a configurable count of how many of the most recent data points are kept, to control the amount of memory the feature uses. The default value could be something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec
[ https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4498: Attachment: LUCENE-4498.patch duh I forgot to actually not seek in the previous patch: here's the updated patch. pulse docfreq=1 DOCS_ONLY for 4.1 codec --- Key: LUCENE-4498 URL: https://issues.apache.org/jira/browse/LUCENE-4498 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Robert Muir Attachments: LUCENE-4498.patch, LUCENE-4498.patch We have pulsing codec, but currently this has some downsides: * its very general, wrapping an arbitrary postingsformat and pulsing everything in the postings for an arbitrary docfreq/totalTermFreq cutoff * reuse is hairy: because it specializes its enums based on these cutoffs, when walking thru terms e.g. merging there is a lot of sophisticated stuff to avoid the worst cases where we clone indexinputs for tons of terms. On the other hand the way the 4.1 codec encodes primary key fields is pretty silly, we write the docStartFP vlong in the term dictionary metadata, which tells us where to seek in the .doc to read our one lonely vint. I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just write the lone doc delta where we would write docStartFP. We can avoid the hairy reuse problem too, by just supporting this in refillDocs() in BlockDocsEnum instead of specializing. This would remove the additional seek for primary key fields without really any of the downsides of pulsing today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError
[ https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481696#comment-13481696 ] Aaron Daubman commented on SOLR-3849: - This appears to still be affecting me in 4_0_0 (1400746) Running under OS X 10.8.2 with $ java -version java version 1.7.0_09 Java(TM) SE Runtime Environment (build 1.7.0_09-b05) Java HotSpot(TM) 64-Bit Server VM (build 23.5-b02, mixed mode) ---snip--- $ ant test -Dtestcase=ScriptEngineTest ... common.test: [junit4:junit4] JUnit4 says مرحبا! Master seed: 4050036B906720D2 [junit4:junit4] Executing 1 suite with 1 JVM. [junit4:junit4] [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby [junit4:junit4] Assumption #1: got: [null], expected: each not null [junit4:junit4] OK 0.11s | ScriptEngineTest.testPut [junit4:junit4] OK 0.01s | ScriptEngineTest.testEvalReader [junit4:junit4] OK 0.01s | ScriptEngineTest.testEvalText [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByExtension [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByName [junit4:junit4] 2 NOTE: test params are: codec=Lucene3x, sim=RandomSimilarityProvider(queryNorm=true,coord=crazy): {}, locale=es_DO, timezone=America/Godthab [junit4:junit4] 2 NOTE: Mac OS X 10.8.2 x86_64/Oracle Corporation 1.7.0_09 (64-bit)/cpus=8,threads=1,free=2966056,total=12320768 [junit4:junit4] 2 NOTE: All tests run in this JVM: [ScriptEngineTest] [junit4:junit4] 2 NOTE: reproduce with: ant test -Dtestcase=ScriptEngineTest -Dtests.seed=4050036B906720D2 -Dtests.slow=true -Dtests.locale=es_DO -Dtests.timezone=America/Godthab -Dtests.file.encoding=US-ASCII [junit4:junit4] ERROR 0.00s | ScriptEngineTest (suite) [junit4:junit4] Throwable #1: java.lang.AssertionError: System properties invariant violated. [junit4:junit4] New keys: [junit4:junit4] sun.awt.enableExtraMouseButtons=true [junit4:junit4] sun.font.fontmanager=sun.font.CFontManager [junit4:junit4] [junit4:junit4]at __randomizedtesting.SeedInfo.seed([4050036B906720D2]:0) [junit4:junit4]at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66) [junit4:junit4]at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4]at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4]at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4]at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) [junit4:junit4]at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) [junit4:junit4]at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) [junit4:junit4]at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) [junit4:junit4]at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4]at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) [junit4:junit4]at java.lang.Thread.run(Thread.java:722) [junit4:junit4] Completed in 1.14s, 6 tests, 1 failure, 1 skipped FAILURES! [junit4:junit4] [junit4:junit4] [junit4:junit4] Tests with failures: [junit4:junit4] - org.apache.solr.update.processor.ScriptEngineTest (suite) [junit4:junit4] [junit4:junit4] [junit4:junit4] JVM J0: 0.92 .. 2.86 = 1.94s [junit4:junit4] Execution time total: 2.96 sec. [junit4:junit4] Tests summary: 1 suite, 6 tests, 1 suite-level error, 1 ignored (1 assumption) BUILD FAILED /Users/adaubman/Projects/lucene_solr_4_0_0/build.xml:40: The following error occurred while executing this line: /Users/adaubman/Projects/lucene_solr_4_0_0/solr/build.xml:179: The following error occurred while executing this line: /Users/adaubman/Projects/lucene_solr_4_0_0/lucene/module-build.xml:63: The following error occurred while executing this line: /Users/adaubman/Projects/lucene_solr_4_0_0/lucene/common-build.xml:1142: The following error occurred while executing this line: /Users/adaubman/Projects/lucene_solr_4_0_0/lucene/common-build.xml:815: There were test failures: 1 suite, 6 tests, 1 suite-level error, 1 ignored (1 assumption) Total time: 24 seconds ---snip--- ScriptEngineTest failure RE system properties and ThreadLeakError - Key: SOLR-3849 URL:
[jira] [Updated] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec
[ https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4498: Attachment: LUCENE-4498_lazy.patch Here is a patch with a lazy clone() of the docsenum, e.g. when someone isnt reusing docsenum like doing termqueries or whatever, they won't pay the price of NIOFS buffer reads etc just for a primary key. pulse docfreq=1 DOCS_ONLY for 4.1 codec --- Key: LUCENE-4498 URL: https://issues.apache.org/jira/browse/LUCENE-4498 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Robert Muir Attachments: LUCENE-4498_lazy.patch, LUCENE-4498.patch, LUCENE-4498.patch We have pulsing codec, but currently this has some downsides: * its very general, wrapping an arbitrary postingsformat and pulsing everything in the postings for an arbitrary docfreq/totalTermFreq cutoff * reuse is hairy: because it specializes its enums based on these cutoffs, when walking thru terms e.g. merging there is a lot of sophisticated stuff to avoid the worst cases where we clone indexinputs for tons of terms. On the other hand the way the 4.1 codec encodes primary key fields is pretty silly, we write the docStartFP vlong in the term dictionary metadata, which tells us where to seek in the .doc to read our one lonely vint. I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just write the lone doc delta where we would write docStartFP. We can avoid the hairy reuse problem too, by just supporting this in refillDocs() in BlockDocsEnum instead of specializing. This would remove the additional seek for primary key fields without really any of the downsides of pulsing today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError
[ https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481716#comment-13481716 ] Steven Rowe commented on SOLR-3849: --- I see the exact same failure on OS X 10.8.2 w/ Java 1.7.0_07. However, this test succeeds w/ Java 1.6.0_37. ScriptEngineTest failure RE system properties and ThreadLeakError - Key: SOLR-3849 URL: https://issues.apache.org/jira/browse/SOLR-3849 Project: Solr Issue Type: Bug Components: update Affects Versions: 5.0 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 Reporter: David Smiley Assignee: Uwe Schindler Fix For: 4.0, 5.0 Attachments: SOLR-3849.patch 100% reproducible for me: solr$ ant test -Dtestcase=ScriptEngineTest {noformat} [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4 [junit4:junit4] Executing 1 suite with 1 JVM. [junit4:junit4] [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest [junit4:junit4] OK 0.17s | ScriptEngineTest.testPut [junit4:junit4] OK 0.02s | ScriptEngineTest.testEvalReader [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby [junit4:junit4] Assumption #1: got: [null], expected: each not null [junit4:junit4] OK 0.01s | ScriptEngineTest.testEvalText [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByExtension [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByName [junit4:junit4] 2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING Will linger awaiting termination of 2 leaked thread(s). [junit4:junit4] 2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 thread leaked from SUITE scope at org.apache.solr.update.processor.ScriptEngineTest: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll Starting to interrupt leaked threads: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE There are still zombie threads that couldn't be terminated: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 NOTE: test params are: codec=SimpleText, sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, timezone=America/Edmonton [junit4:junit4] 2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 [junit4:junit4] 2 NOTE: All tests run in this JVM: [ScriptEngineTest] [junit4:junit4] 2 NOTE: reproduce with: ant test -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton -Dtests.file.encoding=UTF-8 [junit4:junit4] ERROR 0.00s | ScriptEngineTest (suite) [junit4:junit4] Throwable #1: java.lang.AssertionError: System properties invariant violated. [junit4:junit4] New keys: [junit4:junit4] sun.awt.enableExtraMouseButtons=true [junit4:junit4] sun.font.fontmanager=sun.font.CFontManager [junit4:junit4] [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) [junit4:junit4] at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) [junit4:junit4] at java.lang.Thread.run(Thread.java:722) [junit4:junit4]
[jira] [Created] (LUCENE-4500) Loosen up DirectSpellChecker's minPrefix requirements
Erik Hatcher created LUCENE-4500: Summary: Loosen up DirectSpellChecker's minPrefix requirements Key: LUCENE-4500 URL: https://issues.apache.org/jira/browse/LUCENE-4500 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0 Reporter: Erik Hatcher Priority: Minor DirectSpellChecker currently mandates a minPrefix of 1 when editDistance=2. This prohibits a query of nusglasses from matching the indexed sunglasses term. Granted, there can be performance issues with using a minPrefix of 0, but it's a risk that a user should be allowed to take if needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError
[ https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481721#comment-13481721 ] Dawid Weiss commented on SOLR-3849: --- Interesting. We could ignore those properties but they indicate that an AWT daemon was for some reason startup up and messed up system properties. Uwe may want to kill it rather than just ignoring these props. ScriptEngineTest failure RE system properties and ThreadLeakError - Key: SOLR-3849 URL: https://issues.apache.org/jira/browse/SOLR-3849 Project: Solr Issue Type: Bug Components: update Affects Versions: 5.0 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 Reporter: David Smiley Assignee: Uwe Schindler Fix For: 4.0, 5.0 Attachments: SOLR-3849.patch 100% reproducible for me: solr$ ant test -Dtestcase=ScriptEngineTest {noformat} [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4 [junit4:junit4] Executing 1 suite with 1 JVM. [junit4:junit4] [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest [junit4:junit4] OK 0.17s | ScriptEngineTest.testPut [junit4:junit4] OK 0.02s | ScriptEngineTest.testEvalReader [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby [junit4:junit4] Assumption #1: got: [null], expected: each not null [junit4:junit4] OK 0.01s | ScriptEngineTest.testEvalText [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByExtension [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByName [junit4:junit4] 2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING Will linger awaiting termination of 2 leaked thread(s). [junit4:junit4] 2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 thread leaked from SUITE scope at org.apache.solr.update.processor.ScriptEngineTest: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll Starting to interrupt leaked threads: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE There are still zombie threads that couldn't be terminated: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 NOTE: test params are: codec=SimpleText, sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, timezone=America/Edmonton [junit4:junit4] 2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 [junit4:junit4] 2 NOTE: All tests run in this JVM: [ScriptEngineTest] [junit4:junit4] 2 NOTE: reproduce with: ant test -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton -Dtests.file.encoding=UTF-8 [junit4:junit4] ERROR 0.00s | ScriptEngineTest (suite) [junit4:junit4] Throwable #1: java.lang.AssertionError: System properties invariant violated. [junit4:junit4] New keys: [junit4:junit4] sun.awt.enableExtraMouseButtons=true [junit4:junit4] sun.font.fontmanager=sun.font.CFontManager [junit4:junit4] [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) [junit4:junit4] at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at
[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError
[ https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481722#comment-13481722 ] Uwe Schindler commented on SOLR-3849: - Do anybody of you maybe have a custom scriptng engine in classpath? This could cause some bootup of some non-JDK script environment bootup that modifies those system variables. Maybe Apple/Macintosh has some CrazyUselessAsAlwaysMäcintrashEngine shipped by default. ScriptEngineTest failure RE system properties and ThreadLeakError - Key: SOLR-3849 URL: https://issues.apache.org/jira/browse/SOLR-3849 Project: Solr Issue Type: Bug Components: update Affects Versions: 5.0 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 Reporter: David Smiley Assignee: Uwe Schindler Fix For: 4.0, 5.0 Attachments: SOLR-3849.patch 100% reproducible for me: solr$ ant test -Dtestcase=ScriptEngineTest {noformat} [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4 [junit4:junit4] Executing 1 suite with 1 JVM. [junit4:junit4] [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest [junit4:junit4] OK 0.17s | ScriptEngineTest.testPut [junit4:junit4] OK 0.02s | ScriptEngineTest.testEvalReader [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby [junit4:junit4] Assumption #1: got: [null], expected: each not null [junit4:junit4] OK 0.01s | ScriptEngineTest.testEvalText [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByExtension [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByName [junit4:junit4] 2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING Will linger awaiting termination of 2 leaked thread(s). [junit4:junit4] 2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 thread leaked from SUITE scope at org.apache.solr.update.processor.ScriptEngineTest: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll Starting to interrupt leaked threads: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE There are still zombie threads that couldn't be terminated: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 NOTE: test params are: codec=SimpleText, sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, timezone=America/Edmonton [junit4:junit4] 2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 [junit4:junit4] 2 NOTE: All tests run in this JVM: [ScriptEngineTest] [junit4:junit4] 2 NOTE: reproduce with: ant test -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton -Dtests.file.encoding=UTF-8 [junit4:junit4] ERROR 0.00s | ScriptEngineTest (suite) [junit4:junit4] Throwable #1: java.lang.AssertionError: System properties invariant violated. [junit4:junit4] New keys: [junit4:junit4] sun.awt.enableExtraMouseButtons=true [junit4:junit4] sun.font.fontmanager=sun.font.CFontManager [junit4:junit4] [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) [junit4:junit4] at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at
[jira] [Commented] (LUCENE-4500) Loosen up DirectSpellChecker's minPrefix requirements
[ https://issues.apache.org/jira/browse/LUCENE-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481723#comment-13481723 ] Erik Hatcher commented on LUCENE-4500: -- This patch to DirectSpellChecker does the trick (using accuracy=0.8 or less in the description example): {code} -FuzzyTermsEnum e = new FuzzyTermsEnum(terms, atts, term, editDistance, Math.max(minPrefix, editDistance-1), true); +FuzzyTermsEnum e = new FuzzyTermsEnum(terms, atts, term, editDistance, minPrefix, true); {code} In a conversation with Robert Muir, we agreed that this, rather, should keep the default that restricts to minPrefix=1 when editDistance=2, but made optional to use a minPrefix=0. Loosen up DirectSpellChecker's minPrefix requirements - Key: LUCENE-4500 URL: https://issues.apache.org/jira/browse/LUCENE-4500 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0 Reporter: Erik Hatcher Priority: Minor DirectSpellChecker currently mandates a minPrefix of 1 when editDistance=2. This prohibits a query of nusglasses from matching the indexed sunglasses term. Granted, there can be performance issues with using a minPrefix of 0, but it's a risk that a user should be allowed to take if needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4500) Loosen up DirectSpellChecker's minPrefix requirements
[ https://issues.apache.org/jira/browse/LUCENE-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481723#comment-13481723 ] Erik Hatcher edited comment on LUCENE-4500 at 10/22/12 7:49 PM: This patch to DirectSpellChecker does the trick (using accuracy=0.8 or less in the description example): {code} -FuzzyTermsEnum e = new FuzzyTermsEnum(terms, atts, term, editDistance, Math.max(minPrefix, editDistance-1), true); +FuzzyTermsEnum e = new FuzzyTermsEnum(terms, atts, term, editDistance, minPrefix, true); {code} In a conversation with Robert Muir, we agreed that this, rather, should keep the default that restricts to minPrefix=1 when editDistance=2, but made optional to allow using a minPrefix=0. was (Author: ehatcher): This patch to DirectSpellChecker does the trick (using accuracy=0.8 or less in the description example): {code} -FuzzyTermsEnum e = new FuzzyTermsEnum(terms, atts, term, editDistance, Math.max(minPrefix, editDistance-1), true); +FuzzyTermsEnum e = new FuzzyTermsEnum(terms, atts, term, editDistance, minPrefix, true); {code} In a conversation with Robert Muir, we agreed that this, rather, should keep the default that restricts to minPrefix=1 when editDistance=2, but made optional to use a minPrefix=0. Loosen up DirectSpellChecker's minPrefix requirements - Key: LUCENE-4500 URL: https://issues.apache.org/jira/browse/LUCENE-4500 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0 Reporter: Erik Hatcher Priority: Minor DirectSpellChecker currently mandates a minPrefix of 1 when editDistance=2. This prohibits a query of nusglasses from matching the indexed sunglasses term. Granted, there can be performance issues with using a minPrefix of 0, but it's a risk that a user should be allowed to take if needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError
[ https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481727#comment-13481727 ] Dawid Weiss commented on SOLR-3849: --- The way to check is to substitute system properties with a custom implementation of Properties, override setProperty and dump a stack trace when these are actually set to see who the offender is. I'll take a look unless somebody beats me to it. ScriptEngineTest failure RE system properties and ThreadLeakError - Key: SOLR-3849 URL: https://issues.apache.org/jira/browse/SOLR-3849 Project: Solr Issue Type: Bug Components: update Affects Versions: 5.0 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 Reporter: David Smiley Assignee: Uwe Schindler Fix For: 4.0, 5.0 Attachments: SOLR-3849.patch 100% reproducible for me: solr$ ant test -Dtestcase=ScriptEngineTest {noformat} [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4 [junit4:junit4] Executing 1 suite with 1 JVM. [junit4:junit4] [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest [junit4:junit4] OK 0.17s | ScriptEngineTest.testPut [junit4:junit4] OK 0.02s | ScriptEngineTest.testEvalReader [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby [junit4:junit4] Assumption #1: got: [null], expected: each not null [junit4:junit4] OK 0.01s | ScriptEngineTest.testEvalText [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByExtension [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByName [junit4:junit4] 2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING Will linger awaiting termination of 2 leaked thread(s). [junit4:junit4] 2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 thread leaked from SUITE scope at org.apache.solr.update.processor.ScriptEngineTest: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll Starting to interrupt leaked threads: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE There are still zombie threads that couldn't be terminated: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 NOTE: test params are: codec=SimpleText, sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, timezone=America/Edmonton [junit4:junit4] 2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 [junit4:junit4] 2 NOTE: All tests run in this JVM: [ScriptEngineTest] [junit4:junit4] 2 NOTE: reproduce with: ant test -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton -Dtests.file.encoding=UTF-8 [junit4:junit4] ERROR 0.00s | ScriptEngineTest (suite) [junit4:junit4] Throwable #1: java.lang.AssertionError: System properties invariant violated. [junit4:junit4] New keys: [junit4:junit4] sun.awt.enableExtraMouseButtons=true [junit4:junit4] sun.font.fontmanager=sun.font.CFontManager [junit4:junit4] [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) [junit4:junit4] at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at
[jira] [Commented] (LUCENE-4500) Loosen up DirectSpellChecker's minPrefix requirements
[ https://issues.apache.org/jira/browse/LUCENE-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481728#comment-13481728 ] Robert Muir commented on LUCENE-4500: - yeah i think we should add an option to disable this heuristic. It was basically a perf/relevance thing (in general edits of 2, esp considering a transposition is a single edit, along wotj minPrefix of 0 can yield surprisingly irrelevant stuff). But if someone wants that... let them do it. Loosen up DirectSpellChecker's minPrefix requirements - Key: LUCENE-4500 URL: https://issues.apache.org/jira/browse/LUCENE-4500 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0 Reporter: Erik Hatcher Priority: Minor DirectSpellChecker currently mandates a minPrefix of 1 when editDistance=2. This prohibits a query of nusglasses from matching the indexed sunglasses term. Granted, there can be performance issues with using a minPrefix of 0, but it's a risk that a user should be allowed to take if needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError
[ https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481741#comment-13481741 ] Uwe Schindler commented on SOLR-3849: - The strange thing about this issue is still the fact that we have: {code:xml} sysproperty key=java.awt.headless value=true/ {code} Why is AWT booted up at all? This seems to be some OS-X Java bug. ScriptEngineTest failure RE system properties and ThreadLeakError - Key: SOLR-3849 URL: https://issues.apache.org/jira/browse/SOLR-3849 Project: Solr Issue Type: Bug Components: update Affects Versions: 5.0 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 Reporter: David Smiley Assignee: Uwe Schindler Fix For: 4.0, 5.0 Attachments: SOLR-3849.patch 100% reproducible for me: solr$ ant test -Dtestcase=ScriptEngineTest {noformat} [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4 [junit4:junit4] Executing 1 suite with 1 JVM. [junit4:junit4] [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest [junit4:junit4] OK 0.17s | ScriptEngineTest.testPut [junit4:junit4] OK 0.02s | ScriptEngineTest.testEvalReader [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby [junit4:junit4] Assumption #1: got: [null], expected: each not null [junit4:junit4] OK 0.01s | ScriptEngineTest.testEvalText [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByExtension [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByName [junit4:junit4] 2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING Will linger awaiting termination of 2 leaked thread(s). [junit4:junit4] 2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 thread leaked from SUITE scope at org.apache.solr.update.processor.ScriptEngineTest: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll Starting to interrupt leaked threads: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE There are still zombie threads that couldn't be terminated: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 NOTE: test params are: codec=SimpleText, sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, timezone=America/Edmonton [junit4:junit4] 2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 [junit4:junit4] 2 NOTE: All tests run in this JVM: [ScriptEngineTest] [junit4:junit4] 2 NOTE: reproduce with: ant test -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton -Dtests.file.encoding=UTF-8 [junit4:junit4] ERROR 0.00s | ScriptEngineTest (suite) [junit4:junit4] Throwable #1: java.lang.AssertionError: System properties invariant violated. [junit4:junit4] New keys: [junit4:junit4] sun.awt.enableExtraMouseButtons=true [junit4:junit4] sun.font.fontmanager=sun.font.CFontManager [junit4:junit4] [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) [junit4:junit4] at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at
[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError
[ https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481744#comment-13481744 ] Steven Rowe commented on SOLR-3849: --- bq. Do anybody of you maybe have a custom scriptng engine in classpath? My CLASSPATH env. var. is undefined. bq. Maybe Apple/Macintosh has some CrazyUselessAsAlwaysMäcintrashEngine shipped by default. Oracle produces 1.7 JDK for OS X, and the 1.6 JDK comes from Apple. ScriptEngineTest failure RE system properties and ThreadLeakError - Key: SOLR-3849 URL: https://issues.apache.org/jira/browse/SOLR-3849 Project: Solr Issue Type: Bug Components: update Affects Versions: 5.0 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 Reporter: David Smiley Assignee: Uwe Schindler Fix For: 4.0, 5.0 Attachments: SOLR-3849.patch 100% reproducible for me: solr$ ant test -Dtestcase=ScriptEngineTest {noformat} [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4 [junit4:junit4] Executing 1 suite with 1 JVM. [junit4:junit4] [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest [junit4:junit4] OK 0.17s | ScriptEngineTest.testPut [junit4:junit4] OK 0.02s | ScriptEngineTest.testEvalReader [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby [junit4:junit4] Assumption #1: got: [null], expected: each not null [junit4:junit4] OK 0.01s | ScriptEngineTest.testEvalText [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByExtension [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByName [junit4:junit4] 2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING Will linger awaiting termination of 2 leaked thread(s). [junit4:junit4] 2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 thread leaked from SUITE scope at org.apache.solr.update.processor.ScriptEngineTest: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll Starting to interrupt leaked threads: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE There are still zombie threads that couldn't be terminated: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 NOTE: test params are: codec=SimpleText, sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, timezone=America/Edmonton [junit4:junit4] 2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 [junit4:junit4] 2 NOTE: All tests run in this JVM: [ScriptEngineTest] [junit4:junit4] 2 NOTE: reproduce with: ant test -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton -Dtests.file.encoding=UTF-8 [junit4:junit4] ERROR 0.00s | ScriptEngineTest (suite) [junit4:junit4] Throwable #1: java.lang.AssertionError: System properties invariant violated. [junit4:junit4] New keys: [junit4:junit4] sun.awt.enableExtraMouseButtons=true [junit4:junit4] sun.font.fontmanager=sun.font.CFontManager [junit4:junit4] [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) [junit4:junit4] at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at
[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError
[ https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481745#comment-13481745 ] Robert Muir commented on SOLR-3849: --- when I run 'ant check-svn-working-copy' (even on 1.6) on my apple it boots up AWT as well. I thought we were passing headless to all this stuff now? ScriptEngineTest failure RE system properties and ThreadLeakError - Key: SOLR-3849 URL: https://issues.apache.org/jira/browse/SOLR-3849 Project: Solr Issue Type: Bug Components: update Affects Versions: 5.0 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 Reporter: David Smiley Assignee: Uwe Schindler Fix For: 4.0, 5.0 Attachments: SOLR-3849.patch 100% reproducible for me: solr$ ant test -Dtestcase=ScriptEngineTest {noformat} [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4 [junit4:junit4] Executing 1 suite with 1 JVM. [junit4:junit4] [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest [junit4:junit4] OK 0.17s | ScriptEngineTest.testPut [junit4:junit4] OK 0.02s | ScriptEngineTest.testEvalReader [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby [junit4:junit4] Assumption #1: got: [null], expected: each not null [junit4:junit4] OK 0.01s | ScriptEngineTest.testEvalText [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByExtension [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByName [junit4:junit4] 2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING Will linger awaiting termination of 2 leaked thread(s). [junit4:junit4] 2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 thread leaked from SUITE scope at org.apache.solr.update.processor.ScriptEngineTest: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll Starting to interrupt leaked threads: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE There are still zombie threads that couldn't be terminated: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 NOTE: test params are: codec=SimpleText, sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, timezone=America/Edmonton [junit4:junit4] 2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 [junit4:junit4] 2 NOTE: All tests run in this JVM: [ScriptEngineTest] [junit4:junit4] 2 NOTE: reproduce with: ant test -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton -Dtests.file.encoding=UTF-8 [junit4:junit4] ERROR 0.00s | ScriptEngineTest (suite) [junit4:junit4] Throwable #1: java.lang.AssertionError: System properties invariant violated. [junit4:junit4] New keys: [junit4:junit4] sun.awt.enableExtraMouseButtons=true [junit4:junit4] sun.font.fontmanager=sun.font.CFontManager [junit4:junit4] [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) [junit4:junit4] at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) [junit4:junit4] at
[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481751#comment-13481751 ] Shawn Heisey commented on SOLR-1972: Awesome, Alan! What options might we have to prevent long-running handlers from accumulating huge metrics histories and chewing up tons of RAM? Is there a get75thpercentile method? With the old patch, I do 75, 95, and 99. I would also like to add 99.9, but the old patch uses ints so that wasn't possible. When I have a moment, I will attempt to look at the javadocs for the package and answer my own questions. Unless you get to it first, I will also attempt to mod the patch to expose any memory-limiting options. Need additional query stats in admin interface - median, 95th and 99th percentile - Key: SOLR-1972 URL: https://issues.apache.org/jira/browse/SOLR-1972 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Shawn Heisey Priority: Minor Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972-url_pattern.patch I would like to see more detailed query statistics from the admin GUI. This is what you can get now: requests : 809 errors : 0 timeouts : 0 totalTime : 70053 avgTimePerRequest : 86.59209 avgRequestsPerSecond : 0.8148785 I'd like to see more data on the time per request - median, 95th percentile, 99th percentile, and any other statistical function that makes sense to include. In my environment, the first bunch of queries after startup tend to take several seconds each. I find that the average value tends to be useless until it has several thousand queries under its belt and the caches are thoroughly warmed. The statistical functions I have mentioned would quickly eliminate the influence of those initial slow queries. The system will have to store individual data about each query. I don't know if this is something Solr does already. It would be nice to have a configurable count of how many of the most recent data points are kept, to control the amount of memory the feature uses. The default value could be something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError
[ https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481762#comment-13481762 ] Uwe Schindler commented on SOLR-3849: - Digging around the source code of OpenJDK i found the following horrible class: http://cr.openjdk.java.net/~michaelm/7113349/7u4/1/jdk/new/raw_files/new/src/macosx/classes/apple/applescript/AppleScriptEngineFactory.java In fact this one is the factory class for (as I said before) Apple's custom ÄppleScript engine. If you look at the static ctor, you know what's happening: As soon as the scripting engine manager is loading the factory class via SPI from rt.jar, this code is executed and boots up AWT. The question is, why java.awt.headless=true does not prevent this, but I assume the if statement for that is missing. ScriptEngineTest failure RE system properties and ThreadLeakError - Key: SOLR-3849 URL: https://issues.apache.org/jira/browse/SOLR-3849 Project: Solr Issue Type: Bug Components: update Affects Versions: 5.0 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 Reporter: David Smiley Assignee: Uwe Schindler Fix For: 4.0, 5.0 Attachments: SOLR-3849.patch 100% reproducible for me: solr$ ant test -Dtestcase=ScriptEngineTest {noformat} [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4 [junit4:junit4] Executing 1 suite with 1 JVM. [junit4:junit4] [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest [junit4:junit4] OK 0.17s | ScriptEngineTest.testPut [junit4:junit4] OK 0.02s | ScriptEngineTest.testEvalReader [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby [junit4:junit4] Assumption #1: got: [null], expected: each not null [junit4:junit4] OK 0.01s | ScriptEngineTest.testEvalText [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByExtension [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByName [junit4:junit4] 2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING Will linger awaiting termination of 2 leaked thread(s). [junit4:junit4] 2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 thread leaked from SUITE scope at org.apache.solr.update.processor.ScriptEngineTest: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll Starting to interrupt leaked threads: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE There are still zombie threads that couldn't be terminated: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 NOTE: test params are: codec=SimpleText, sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, timezone=America/Edmonton [junit4:junit4] 2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 [junit4:junit4] 2 NOTE: All tests run in this JVM: [ScriptEngineTest] [junit4:junit4] 2 NOTE: reproduce with: ant test -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton -Dtests.file.encoding=UTF-8 [junit4:junit4] ERROR 0.00s | ScriptEngineTest (suite) [junit4:junit4] Throwable #1: java.lang.AssertionError: System properties invariant violated. [junit4:junit4] New keys: [junit4:junit4] sun.awt.enableExtraMouseButtons=true [junit4:junit4] sun.font.fontmanager=sun.font.CFontManager [junit4:junit4] [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) [junit4:junit4] at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) [junit4:junit4] at
[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError
[ https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481766#comment-13481766 ] Uwe Schindler commented on SOLR-3849: - We should file a bug at Oracle telling them that this scripting engine does not respect headless setting. ScriptEngineTest failure RE system properties and ThreadLeakError - Key: SOLR-3849 URL: https://issues.apache.org/jira/browse/SOLR-3849 Project: Solr Issue Type: Bug Components: update Affects Versions: 5.0 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 Reporter: David Smiley Assignee: Uwe Schindler Fix For: 4.0, 5.0 Attachments: SOLR-3849.patch 100% reproducible for me: solr$ ant test -Dtestcase=ScriptEngineTest {noformat} [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4 [junit4:junit4] Executing 1 suite with 1 JVM. [junit4:junit4] [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest [junit4:junit4] OK 0.17s | ScriptEngineTest.testPut [junit4:junit4] OK 0.02s | ScriptEngineTest.testEvalReader [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby [junit4:junit4] Assumption #1: got: [null], expected: each not null [junit4:junit4] OK 0.01s | ScriptEngineTest.testEvalText [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByExtension [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByName [junit4:junit4] 2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING Will linger awaiting termination of 2 leaked thread(s). [junit4:junit4] 2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 thread leaked from SUITE scope at org.apache.solr.update.processor.ScriptEngineTest: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll Starting to interrupt leaked threads: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE There are still zombie threads that couldn't be terminated: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 NOTE: test params are: codec=SimpleText, sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, timezone=America/Edmonton [junit4:junit4] 2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 [junit4:junit4] 2 NOTE: All tests run in this JVM: [ScriptEngineTest] [junit4:junit4] 2 NOTE: reproduce with: ant test -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton -Dtests.file.encoding=UTF-8 [junit4:junit4] ERROR 0.00s | ScriptEngineTest (suite) [junit4:junit4] Throwable #1: java.lang.AssertionError: System properties invariant violated. [junit4:junit4] New keys: [junit4:junit4] sun.awt.enableExtraMouseButtons=true [junit4:junit4] sun.font.fontmanager=sun.font.CFontManager [junit4:junit4] [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) [junit4:junit4] at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) [junit4:junit4] at java.lang.Thread.run(Thread.java:722) [junit4:junit4]
[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481767#comment-13481767 ] Alan Woodward commented on SOLR-1972: - Hi Shawn, Metrics uses reservoir sampling to maintain its measurements, so the history is actually always a fixed size. This is configurable, but defaults to 1024 entries. There's more information at http://metrics.codahale.com/manual/core/#histograms and http://www.johndcook.com/standard_deviation.html. There are get75thpercentile and get999thpercentile methods out of the box, and you can also ask for values at arbitrary percentiles using getValue(). Need additional query stats in admin interface - median, 95th and 99th percentile - Key: SOLR-1972 URL: https://issues.apache.org/jira/browse/SOLR-1972 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Shawn Heisey Priority: Minor Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972-url_pattern.patch I would like to see more detailed query statistics from the admin GUI. This is what you can get now: requests : 809 errors : 0 timeouts : 0 totalTime : 70053 avgTimePerRequest : 86.59209 avgRequestsPerSecond : 0.8148785 I'd like to see more data on the time per request - median, 95th percentile, 99th percentile, and any other statistical function that makes sense to include. In my environment, the first bunch of queries after startup tend to take several seconds each. I find that the average value tends to be useless until it has several thousand queries under its belt and the caches are thoroughly warmed. The statistical functions I have mentioned would quickly eliminate the influence of those initial slow queries. The system will have to store individual data about each query. I don't know if this is something Solr does already. It would be nice to have a configurable count of how many of the most recent data points are kept, to control the amount of memory the feature uses. The default value could be something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec
[ https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481768#comment-13481768 ] Michael McCandless commented on LUCENE-4498: Looks good: {noformat} TaskQPS base StdDevQPS comp StdDev Pct diff Respell 86.70 (3.0%) 84.04 (2.6%) -3.1% ( -8% -2%) OrHighMed 41.52 (5.8%) 40.44 (6.1%) -2.6% ( -13% -9%) OrHighLow 25.43 (6.0%) 24.77 (6.4%) -2.6% ( -14% - 10%) OrHighHigh9.38 (5.9%)9.15 (6.4%) -2.5% ( -14% - 10%) Wildcard 93.94 (4.1%) 92.36 (2.0%) -1.7% ( -7% -4%) MedTerm 211.10 (12.3%) 208.78 (13.4%) -1.1% ( -23% - 27%) IntNRQ 10.74 (11.3%) 10.62 (7.8%) -1.1% ( -18% - 20%) HighTerm 25.59 (14.0%) 25.35 (15.0%) -1.0% ( -26% - 32%) MedSpanNear 13.77 (2.3%) 13.68 (1.6%) -0.7% ( -4% -3%) HighSloppyPhrase4.09 (5.4%)4.07 (5.2%) -0.5% ( -10% - 10%) HighSpanNear6.84 (2.9%)6.81 (2.1%) -0.4% ( -5% -4%) Prefix3 17.81 (5.7%) 17.74 (1.5%) -0.4% ( -7% -7%) Fuzzy1 77.54 (2.5%) 77.25 (2.7%) -0.4% ( -5% -4%) AndHighLow 719.17 (2.7%) 716.49 (2.3%) -0.4% ( -5% -4%) Fuzzy2 68.94 (2.4%) 68.69 (2.8%) -0.4% ( -5% -5%) LowSpanNear 12.89 (1.8%) 12.85 (1.3%) -0.3% ( -3% -2%) MedSloppyPhrase 29.92 (3.4%) 29.85 (3.4%) -0.2% ( -6% -6%) LowTerm 500.58 (5.9%) 500.52 (7.0%) -0.0% ( -12% - 13%) LowSloppyPhrase9.57 (4.4%)9.60 (4.3%) 0.4% ( -7% -9%) LowPhrase9.64 (2.8%)9.70 (3.0%) 0.7% ( -4% -6%) AndHighMed 86.68 (1.2%) 87.26 (1.2%) 0.7% ( -1% -3%) MedPhrase7.07 (4.3%)7.15 (4.6%) 1.1% ( -7% - 10%) HighPhrase4.79 (4.8%)4.84 (5.6%) 1.1% ( -8% - 12%) AndHighHigh 25.81 (1.7%) 26.20 (1.2%) 1.5% ( -1% -4%) PKLookup 193.31 (2.1%) 204.74 (1.6%) 5.9% ( 2% -9%) {noformat} pulse docfreq=1 DOCS_ONLY for 4.1 codec --- Key: LUCENE-4498 URL: https://issues.apache.org/jira/browse/LUCENE-4498 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Robert Muir Attachments: LUCENE-4498_lazy.patch, LUCENE-4498.patch, LUCENE-4498.patch We have pulsing codec, but currently this has some downsides: * its very general, wrapping an arbitrary postingsformat and pulsing everything in the postings for an arbitrary docfreq/totalTermFreq cutoff * reuse is hairy: because it specializes its enums based on these cutoffs, when walking thru terms e.g. merging there is a lot of sophisticated stuff to avoid the worst cases where we clone indexinputs for tons of terms. On the other hand the way the 4.1 codec encodes primary key fields is pretty silly, we write the docStartFP vlong in the term dictionary metadata, which tells us where to seek in the .doc to read our one lonely vint. I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just write the lone doc delta where we would write docStartFP. We can avoid the hairy reuse problem too, by just supporting this in refillDocs() in BlockDocsEnum instead of specializing. This would remove the additional seek for primary key fields without really any of the downsides of pulsing today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec
[ https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481769#comment-13481769 ] Robert Muir commented on LUCENE-4498: - This code can be simplified and generalized a bit. basically it just needs to be docFreq == 1. in this case totalTermFreq is redundant for freq, so we can e.g. pulse a term that appears 5 times but only in one doc. I'll update the patch again. pulse docfreq=1 DOCS_ONLY for 4.1 codec --- Key: LUCENE-4498 URL: https://issues.apache.org/jira/browse/LUCENE-4498 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Robert Muir Attachments: LUCENE-4498_lazy.patch, LUCENE-4498.patch, LUCENE-4498.patch We have pulsing codec, but currently this has some downsides: * its very general, wrapping an arbitrary postingsformat and pulsing everything in the postings for an arbitrary docfreq/totalTermFreq cutoff * reuse is hairy: because it specializes its enums based on these cutoffs, when walking thru terms e.g. merging there is a lot of sophisticated stuff to avoid the worst cases where we clone indexinputs for tons of terms. On the other hand the way the 4.1 codec encodes primary key fields is pretty silly, we write the docStartFP vlong in the term dictionary metadata, which tells us where to seek in the .doc to read our one lonely vint. I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just write the lone doc delta where we would write docStartFP. We can avoid the hairy reuse problem too, by just supporting this in refillDocs() in BlockDocsEnum instead of specializing. This would remove the additional seek for primary key fields without really any of the downsides of pulsing today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError
[ https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481774#comment-13481774 ] Dawid Weiss commented on SOLR-3849: --- Thanks for digging, Uwe. So these property invariants are actually useful :) Since this breaks the tests we should add these two to the ignore set (at least until Oracle fixes this?). LuceneTestCase: {code} /** * These property keys will be ignored in verification of altered properties. * @see SystemPropertiesInvariantRule * @see #ruleChain * @see #classRules */ private static final String [] IGNORED_INVARIANT_PROPERTIES = { user.timezone, java.rmi.server.randomIDs }; {code} ScriptEngineTest failure RE system properties and ThreadLeakError - Key: SOLR-3849 URL: https://issues.apache.org/jira/browse/SOLR-3849 Project: Solr Issue Type: Bug Components: update Affects Versions: 5.0 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 Reporter: David Smiley Assignee: Uwe Schindler Fix For: 4.0, 5.0 Attachments: SOLR-3849.patch 100% reproducible for me: solr$ ant test -Dtestcase=ScriptEngineTest {noformat} [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4 [junit4:junit4] Executing 1 suite with 1 JVM. [junit4:junit4] [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest [junit4:junit4] OK 0.17s | ScriptEngineTest.testPut [junit4:junit4] OK 0.02s | ScriptEngineTest.testEvalReader [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby [junit4:junit4] Assumption #1: got: [null], expected: each not null [junit4:junit4] OK 0.01s | ScriptEngineTest.testEvalText [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByExtension [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByName [junit4:junit4] 2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING Will linger awaiting termination of 2 leaked thread(s). [junit4:junit4] 2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 thread leaked from SUITE scope at org.apache.solr.update.processor.ScriptEngineTest: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll Starting to interrupt leaked threads: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE There are still zombie threads that couldn't be terminated: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 NOTE: test params are: codec=SimpleText, sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, timezone=America/Edmonton [junit4:junit4] 2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 [junit4:junit4] 2 NOTE: All tests run in this JVM: [ScriptEngineTest] [junit4:junit4] 2 NOTE: reproduce with: ant test -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton -Dtests.file.encoding=UTF-8 [junit4:junit4] ERROR 0.00s | ScriptEngineTest (suite) [junit4:junit4] Throwable #1: java.lang.AssertionError: System properties invariant violated. [junit4:junit4] New keys: [junit4:junit4] sun.awt.enableExtraMouseButtons=true [junit4:junit4] sun.font.fontmanager=sun.font.CFontManager [junit4:junit4] [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) [junit4:junit4] at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) [junit4:junit4] at
[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError
[ https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481783#comment-13481783 ] Uwe Schindler commented on SOLR-3849: - Can we ignore those *only* for this test? ScriptEngineTest failure RE system properties and ThreadLeakError - Key: SOLR-3849 URL: https://issues.apache.org/jira/browse/SOLR-3849 Project: Solr Issue Type: Bug Components: update Affects Versions: 5.0 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 Reporter: David Smiley Assignee: Uwe Schindler Fix For: 4.0, 5.0 Attachments: SOLR-3849.patch 100% reproducible for me: solr$ ant test -Dtestcase=ScriptEngineTest {noformat} [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4 [junit4:junit4] Executing 1 suite with 1 JVM. [junit4:junit4] [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest [junit4:junit4] OK 0.17s | ScriptEngineTest.testPut [junit4:junit4] OK 0.02s | ScriptEngineTest.testEvalReader [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby [junit4:junit4] Assumption #1: got: [null], expected: each not null [junit4:junit4] OK 0.01s | ScriptEngineTest.testEvalText [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByExtension [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByName [junit4:junit4] 2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING Will linger awaiting termination of 2 leaked thread(s). [junit4:junit4] 2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 thread leaked from SUITE scope at org.apache.solr.update.processor.ScriptEngineTest: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll Starting to interrupt leaked threads: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE There are still zombie threads that couldn't be terminated: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 NOTE: test params are: codec=SimpleText, sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, timezone=America/Edmonton [junit4:junit4] 2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 [junit4:junit4] 2 NOTE: All tests run in this JVM: [ScriptEngineTest] [junit4:junit4] 2 NOTE: reproduce with: ant test -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton -Dtests.file.encoding=UTF-8 [junit4:junit4] ERROR 0.00s | ScriptEngineTest (suite) [junit4:junit4] Throwable #1: java.lang.AssertionError: System properties invariant violated. [junit4:junit4] New keys: [junit4:junit4] sun.awt.enableExtraMouseButtons=true [junit4:junit4] sun.font.fontmanager=sun.font.CFontManager [junit4:junit4] [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) [junit4:junit4] at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) [junit4:junit4] at java.lang.Thread.run(Thread.java:722) [junit4:junit4] Throwable #2:
[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError
[ https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481786#comment-13481786 ] Dawid Weiss commented on SOLR-3849: --- I think you'd have to redefine the entire rule chain by shadowing the field. It's JUnit, not me -- sorry. ScriptEngineTest failure RE system properties and ThreadLeakError - Key: SOLR-3849 URL: https://issues.apache.org/jira/browse/SOLR-3849 Project: Solr Issue Type: Bug Components: update Affects Versions: 5.0 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 Reporter: David Smiley Assignee: Uwe Schindler Fix For: 4.0, 5.0 Attachments: SOLR-3849.patch 100% reproducible for me: solr$ ant test -Dtestcase=ScriptEngineTest {noformat} [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4 [junit4:junit4] Executing 1 suite with 1 JVM. [junit4:junit4] [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest [junit4:junit4] OK 0.17s | ScriptEngineTest.testPut [junit4:junit4] OK 0.02s | ScriptEngineTest.testEvalReader [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby [junit4:junit4] Assumption #1: got: [null], expected: each not null [junit4:junit4] OK 0.01s | ScriptEngineTest.testEvalText [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByExtension [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByName [junit4:junit4] 2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING Will linger awaiting termination of 2 leaked thread(s). [junit4:junit4] 2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 thread leaked from SUITE scope at org.apache.solr.update.processor.ScriptEngineTest: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll Starting to interrupt leaked threads: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE There are still zombie threads that couldn't be terminated: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 NOTE: test params are: codec=SimpleText, sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, timezone=America/Edmonton [junit4:junit4] 2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 [junit4:junit4] 2 NOTE: All tests run in this JVM: [ScriptEngineTest] [junit4:junit4] 2 NOTE: reproduce with: ant test -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton -Dtests.file.encoding=UTF-8 [junit4:junit4] ERROR 0.00s | ScriptEngineTest (suite) [junit4:junit4] Throwable #1: java.lang.AssertionError: System properties invariant violated. [junit4:junit4] New keys: [junit4:junit4] sun.awt.enableExtraMouseButtons=true [junit4:junit4] sun.font.fontmanager=sun.font.CFontManager [junit4:junit4] [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) [junit4:junit4] at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) [junit4:junit4] at java.lang.Thread.run(Thread.java:722) [junit4:junit4]
[jira] [Updated] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec
[ https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4498: Attachment: LUCENE-4498.patch here's the docFreq=1 patch. I like this a lot better, i dont think it really buys us much but just makes the code simpler and easier to understand. pulse docfreq=1 DOCS_ONLY for 4.1 codec --- Key: LUCENE-4498 URL: https://issues.apache.org/jira/browse/LUCENE-4498 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Robert Muir Attachments: LUCENE-4498_lazy.patch, LUCENE-4498.patch, LUCENE-4498.patch, LUCENE-4498.patch We have pulsing codec, but currently this has some downsides: * its very general, wrapping an arbitrary postingsformat and pulsing everything in the postings for an arbitrary docfreq/totalTermFreq cutoff * reuse is hairy: because it specializes its enums based on these cutoffs, when walking thru terms e.g. merging there is a lot of sophisticated stuff to avoid the worst cases where we clone indexinputs for tons of terms. On the other hand the way the 4.1 codec encodes primary key fields is pretty silly, we write the docStartFP vlong in the term dictionary metadata, which tells us where to seek in the .doc to read our one lonely vint. I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just write the lone doc delta where we would write docStartFP. We can avoid the hairy reuse problem too, by just supporting this in refillDocs() in BlockDocsEnum instead of specializing. This would remove the additional seek for primary key fields without really any of the downsides of pulsing today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481797#comment-13481797 ] Shawn Heisey commented on SOLR-1972: I have answers to some of my questions. There is a 75th percentile. I added the 75th and 999th to what you had, and it seems to display the stats page a lot faster than my patch did. We'll see what happens when it gets a few thousand queries under its belt, though. I was running the old patch with 16384 samples, and I put the stats on three handlers, so it was having to copy arrays of 16384 longs a total of six times every time I refreshed the stats page. I may also add the 98th percentile. It may be a good idea to make each percentile point configurable in solrconfig.xml. So far I have not yet figured out whether it is possible to limit the number of samples stored, or anything else which can limit the amount of memory required. The names for the average req/s over the last 5 and 15 minutes are REALLY long. Unless you have a high res display (1920 pixels wide) and maximize the window, the names overlap the values. If I think of a reasonable way to shorten those, I will. I ran into it myself when making my branch_4x patch. Need additional query stats in admin interface - median, 95th and 99th percentile - Key: SOLR-1972 URL: https://issues.apache.org/jira/browse/SOLR-1972 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Shawn Heisey Priority: Minor Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972-url_pattern.patch I would like to see more detailed query statistics from the admin GUI. This is what you can get now: requests : 809 errors : 0 timeouts : 0 totalTime : 70053 avgTimePerRequest : 86.59209 avgRequestsPerSecond : 0.8148785 I'd like to see more data on the time per request - median, 95th percentile, 99th percentile, and any other statistical function that makes sense to include. In my environment, the first bunch of queries after startup tend to take several seconds each. I find that the average value tends to be useless until it has several thousand queries under its belt and the caches are thoroughly warmed. The statistical functions I have mentioned would quickly eliminate the influence of those initial slow queries. The system will have to store individual data about each query. I don't know if this is something Solr does already. It would be nice to have a configurable count of how many of the most recent data points are kept, to control the amount of memory the feature uses. The default value could be something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481809#comment-13481809 ] Shawn Heisey commented on SOLR-1972: I didn't see your reply about the reservoir size until after I'd already submitted mine. If I want to increase/decrease that size, how do I do that? So far poking around the javadocs and using google hasn't turned anything up. Need additional query stats in admin interface - median, 95th and 99th percentile - Key: SOLR-1972 URL: https://issues.apache.org/jira/browse/SOLR-1972 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Shawn Heisey Priority: Minor Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972-url_pattern.patch I would like to see more detailed query statistics from the admin GUI. This is what you can get now: requests : 809 errors : 0 timeouts : 0 totalTime : 70053 avgTimePerRequest : 86.59209 avgRequestsPerSecond : 0.8148785 I'd like to see more data on the time per request - median, 95th percentile, 99th percentile, and any other statistical function that makes sense to include. In my environment, the first bunch of queries after startup tend to take several seconds each. I find that the average value tends to be useless until it has several thousand queries under its belt and the caches are thoroughly warmed. The statistical functions I have mentioned would quickly eliminate the influence of those initial slow queries. The system will have to store individual data about each query. I don't know if this is something Solr does already. It would be nice to have a configurable count of how many of the most recent data points are kept, to control the amount of memory the feature uses. The default value could be something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec
[ https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4498: Attachment: LUCENE-4498.patch patch with file format docs and comment fixes. I think this is ready to go. pulse docfreq=1 DOCS_ONLY for 4.1 codec --- Key: LUCENE-4498 URL: https://issues.apache.org/jira/browse/LUCENE-4498 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Robert Muir Attachments: LUCENE-4498_lazy.patch, LUCENE-4498.patch, LUCENE-4498.patch, LUCENE-4498.patch, LUCENE-4498.patch We have pulsing codec, but currently this has some downsides: * its very general, wrapping an arbitrary postingsformat and pulsing everything in the postings for an arbitrary docfreq/totalTermFreq cutoff * reuse is hairy: because it specializes its enums based on these cutoffs, when walking thru terms e.g. merging there is a lot of sophisticated stuff to avoid the worst cases where we clone indexinputs for tons of terms. On the other hand the way the 4.1 codec encodes primary key fields is pretty silly, we write the docStartFP vlong in the term dictionary metadata, which tells us where to seek in the .doc to read our one lonely vint. I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just write the lone doc delta where we would write docStartFP. We can avoid the hairy reuse problem too, by just supporting this in refillDocs() in BlockDocsEnum instead of specializing. This would remove the additional seek for primary key fields without really any of the downsides of pulsing today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4498) pulse docfreq=1 DOCS_ONLY for 4.1 codec
[ https://issues.apache.org/jira/browse/LUCENE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481824#comment-13481824 ] Michael McCandless commented on LUCENE-4498: +1 Very nice to fold pulsing into the default PF! pulse docfreq=1 DOCS_ONLY for 4.1 codec --- Key: LUCENE-4498 URL: https://issues.apache.org/jira/browse/LUCENE-4498 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Robert Muir Attachments: LUCENE-4498_lazy.patch, LUCENE-4498.patch, LUCENE-4498.patch, LUCENE-4498.patch, LUCENE-4498.patch We have pulsing codec, but currently this has some downsides: * its very general, wrapping an arbitrary postingsformat and pulsing everything in the postings for an arbitrary docfreq/totalTermFreq cutoff * reuse is hairy: because it specializes its enums based on these cutoffs, when walking thru terms e.g. merging there is a lot of sophisticated stuff to avoid the worst cases where we clone indexinputs for tons of terms. On the other hand the way the 4.1 codec encodes primary key fields is pretty silly, we write the docStartFP vlong in the term dictionary metadata, which tells us where to seek in the .doc to read our one lonely vint. I think its worth investigating that in the DOCS_ONLY docfreq=1 case, we just write the lone doc delta where we would write docStartFP. We can avoid the hairy reuse problem too, by just supporting this in refillDocs() in BlockDocsEnum instead of specializing. This would remove the additional seek for primary key fields without really any of the downsides of pulsing today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org