from:"Ahmet Arslan"

null:java.lang.ArrayIndexOutOfBoundsException with solr-trunk

2012-08-17 Thread Ahmet Arslan

Hi All,

I am indexing aspx files into solr-trunk (using ManifoldCF). And I am getting 
below Exception in a pretty much random manner.


solr-spec : 5.0.0.2012.08.16.22.19.11

solr-impl : 5.0-SNAPSHOT exported - iorixxx - 2012-08-16 22:19:11

lucene-spec : 5.0-SNAPSHOT

lucene-impl : 5.0-SNAPSHOT exported - iorixxx - 2012-08-16 22:16:00


When i downgrade to below version, everything works fine.

solr-spec : 5.0.0.2012.07.19.18.36.06

solr-impl : 5.0-SNAPSHOT exported - iorixxx - 2012-07-19 18:36:06

lucene-spec : 5.0-SNAPSHOT

lucene-impl : 5.0-SNAPSHOT exported - iorixxx - 2012-07-19 18:35:10



Aug 17, 2012 10:12:46 PM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at 
org.apache.solr.common.util.FastOutputStream.write(FastOutputStream.java:79)
at 
org.apache.solr.common.util.JavaBinCodec.writeStr(JavaBinCodec.java:470)
at 
org.apache.solr.common.util.JavaBinCodec.writePrimitive(JavaBinCodec.java:545)
at 
org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:232)
at 
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:149)
at 
org.apache.solr.common.util.JavaBinCodec.writeSolrInputDocument(JavaBinCodec.java:388)
at org.apache.solr.update.TransactionLog.write(TransactionLog.java:340)
at org.apache.solr.update.UpdateLog.add(UpdateLog.java:326)
at org.apache.solr.update.UpdateLog.add(UpdateLog.java:311)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:229)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:414)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:535)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:315)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:123)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:121)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:126)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:233)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1658)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
at org.eclipse.jetty.server.Server.handle(Server.java:351)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
at 
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
at

about example-DIH

2012-08-26 Thread Ahmet Arslan

Hello,

In solr-trunk/solr/exampleREADME.txt it says 
java -Dsolr.solr.home=example-DIH but it should be 
java -Dsolr.solr.home=example-DIH/solr (it is correct in 
example-DIH/README.txt)

When execute full-import on mail core, I get this :
( I am note sure if mail core needs some extra jars)

Caused by: java.lang.ClassNotFoundException: org.apache.tika.Tika
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)

Somehow core tika's Dataimport link seems does not working. Weird thing is 
other cores' works. (tested in firefox and safari)

db, rss and solr cores have admin-extra.html while tika and mail don't.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: about example-DIH

2012-08-26 Thread Ahmet Arslan

 In solr-trunk/solr/exampleREADME.txt it says 
 java -Dsolr.solr.home=example-DIH but it should be 
 java -Dsolr.solr.home=example-DIH/solr (it is correct in
 example-DIH/README.txt)
 
 When execute full-import on mail core, I get this :
 ( I am note sure if mail core needs some extra jars)
 
 Caused by: java.lang.ClassNotFoundException:
 org.apache.tika.Tika
     at
 java.net.URLClassLoader$1.run(URLClassLoader.java:202)
     at
 java.security.AccessController.doPrivileged(Native Method)
     at
 java.net.URLClassLoader.findClass(URLClassLoader.java:190)
     at
 java.lang.ClassLoader.loadClass(ClassLoader.java:306)
     at
 java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627)
     at
 java.lang.ClassLoader.loadClass(ClassLoader.java:247)

I created SOLR-3759 for these two issues above.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: about example-DIH

2012-08-26 Thread Ahmet Arslan

 Somehow core tika's Dataimport link seems does not working.
 Weird thing is other cores' works. (tested in firefox and
 safari)

It seems that 
requestHandler name=/admin/ 
class=org.apache.solr.handler.admin.AdminHandlers / is required for UI.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

UUIDField uniqueKey with default=NEW

2012-08-31 Thread Ahmet Arslan

Hi all,

I was following http://wiki.apache.org/solr/UniqueKey#UUID_techniques to setup 
uuid as my uniqueKey. (recent solr-trunk)

fieldType name=uuid class=solr.UUIDField indexed=true /

field name=uniqueKey type=uuid indexed=true stored=true default=NEW 
required=true /

uniqueKeyuniqueKey/uniqueKey

I get the following exception.

SEVERE: null:org.apache.solr.common.SolrException: uniqueKey field (null) can 
not be configured with a default value (NEW)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:496)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:113)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:851)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:539)


I made this working by adding some if checks to IndexSchema.java and 
UpdateCommand.java.

getType().getClass().getName().equals(UUIDField.class.getName()

But I am not sure if this is preferred way.  How can I use uuid as my uniqueKey 
without source code modification?

Thanks,


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: UUIDField uniqueKey with default=NEW

2012-08-31 Thread Ahmet Arslan

 You're trying to use a feature that
 was removed from trunk/4x by SOLR-2796:
 AddUpdateCommand.getIndexedId doesn't work with schema
 configured defaults/copyField - UUIDField/copyField can not
 be used as uniqueKey field.
 
 See:
 https://issues.apache.org/jira/browse/SOLR-2796
 
 This revision:
 http://svn.apache.org/viewvc?view=revisionrevision=1345378
 
 Evidently the wiki was not corrected to note that the
 feature was removed.

Thanks Jack that was helpful!
So in order to use uuid as uniqueKey update processor chain is the way to go. 
There are two ways to do it.

1) field name=uniqueKey type=uuid indexed=true stored=true 
required=true /

   updateRequestProcessorChain name=default-values
processor class=solr.DefaultValueUpdateProcessorFactory
  str name=fieldNameuniqueKey/str
  str name=valueNEW/str
/processor
  processor class=solr.RunUpdateProcessorFactory /
  /updateRequestProcessorChain

2) field name=uniqueKey type=string indexed=true stored=true 
required=true /  
   updateRequestProcessorChain name=uuid
processor class=solr.UUIDUpdateProcessorFactory
  str name=fieldNameuniqueKey/str
/processor
processor class=solr.RunUpdateProcessorFactory /
  /updateRequestProcessorChain

correct? I will try to update the wiki.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: UUIDField uniqueKey with default=NEW

2012-08-31 Thread Ahmet Arslan

 :     processor
 class=solr.DefaultValueUpdateProcessorFactory
 :       str
 name=fieldNameuniqueKey/str
 :       str
 name=valueNEW/str
 :     /processor    
 
 ...that approach won't work, it still relies on the
 UUIDField class 
 accepting NEW as input to generate a new key, and that is
 no longer 
 supported -- it happens to late in the processing for it
 to be used as 
 the unique key)

I tested this approach (At revision 1379678) and it seems working. I can see 
generated values. e.g. str 
name=uniqueKeya259aa91-353f-4824-9f68-01837b721cf7/str


 you may want to primarily describe the
 UUIDUpdateProcessorFactory as 
 how to generate a UUID for new documents, and then as a
 closing comment 
 mention that in Solr 3, the default=NEW approach can be
 used instead.

Thanks for the pointer, I will try do it.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: UUIDField uniqueKey with default=NEW

2012-08-31 Thread Ahmet Arslan

 Hmmm... on a single node instance it might work -- 
You are correct it was a single note setup.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

SolrPluginUtils.docListToSolrDocumentList loads all stored fields

2010-12-18 Thread Ahmet Arslan

Hello,

Regardless of SetString fields parameter, 
SolrPluginUtils#docListToSolrDocumentList method loads all of the stored 
fields. Shouldn't it just load the fields given in the set? Should I file a 
jira ticket?

When small bug in TestCase is seen what is the preffered way to inform it? Open 
an issue or tell here?
Example: In SolrPluginUtilsTest.testDocListConversion method, for loop is not 
executed because list.size() = 0.
commit should be inside the assertU(), and cmd.setLen() should be called.



  

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

phps with SolrDocument

2010-12-24 Thread Ahmet Arslan

JSONWriter.writeSolrDocument method calls writeMapOpener(-1);
However its subclass (PHPSerilalizedWriter) throws an exception in its 
writeMapOpener method if size0. This makes impossible to use SolrDocumentList 
as response with phps. (somehow related to solr-2291)

Here is a snippet demonstrates the case.

public class PHPSTest extends SolrTestCaseJ4 {
  @BeforeClass
  public static void beforeClass() throws Exception {
initCore(solrconfig.xml, schema.xml);
  }

  @Test
  public void testPHPS() throws Exception {

SolrQueryRequest req = req(CommonParams.WT, phps);
SolrQueryResponse rsp = new SolrQueryResponse();
PHPSerializedResponseWriter w = new PHPSerializedResponseWriter();

SetString returnFields = new HashSetString(1);
returnFields.add(id);
returnFields.add(score);
rsp.setReturnFields(returnFields);

StringWriter buf = new StringWriter();

SolrDocument solrDoc = new SolrDocument();
solrDoc.addField(id, 1);
solrDoc.addField(subject, hello2);
solrDoc.addField(title, hello3);
solrDoc.addField(score, 0.7);

SolrDocumentList list = new SolrDocumentList();
list.setNumFound(1);
list.setStart(0);
list.setMaxScore(0.7f);
list.add(solrDoc);

rsp.add(response, list);

w.write(buf, req, rsp);
System.out.println(buf.toString());

req.close();
  }
}


  

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Invalid version (expected 2, but 60) or the data in not in 'javabin' format

2013-01-21 Thread Ahmet Arslan

Hi,

I am was hitting the following exception when doing distributed search.
I am faceting on an int field named contentID. For some queries it was giving 
this error. For some queries it just works fine.

localhost:8080/solr/kanu/select/?shards=localhost:8080/solr/rega,localhost:8080/solr/kanuindent=trueq=kararstart=0rows=15hl=falsewt=xmlfacet=truefacet.limit=-1facet.sort=falsejson.nl=arrarrfq=isXml:falsemm=100%facet.field=contentIDf.contentID.facet.mincount=2

Same search URL works fine for cores (kanu and rega) individually. 

Plus if I use rega core as base search URL it works too. e.g. 
localhost:8080/solr/rega/select/?shards=localhost:8080...

I see that rega core has lots of unique values for contentID field.
So my conclusion is, if a shard response is too big this happens.

This is a bad usage of faceting and I will remove faceting on that field since 
it was added accidentally. 

I still want to share stack traces since error message is somehow misleading.

Jan 21, 2013 10:36:53 PM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.SolrException: java.lang.RuntimeException: 
Invalid version (expected 2, but 60) or the data in not in 'javabin' format
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:300)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1701)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.RuntimeException: Invalid version (expected 2, but 60) or 
the data in not in 'javabin' format
at 
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:109)
at 
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:385)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
... 1 more


When I add shards.tolerant=true exception becomes:

Jan 21, 2013 10:51:51 PM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.NullPointerException
at 
org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:967)
at 
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:630)
at 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:605)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:309)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1701)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
at

Re: Wild card support when stemmers are added

2013-02-05 Thread Ahmet Arslan

Hi,

You have to separate options where you keep both accessorising and 
accessorise at index.

1) https://issues.apache.org/jira/browse/SOLR-3231

2) Create a un-stemmed field and run wildcard queries against it too.



--- On Tue, 2/5/13, msreddy.hi msreddy...@gmail.com wrote:

 From: msreddy.hi msreddy...@gmail.com
 Subject: Re: Wild card support when stemmers are added
 To: dev@lucene.apache.org
 Date: Tuesday, February 5, 2013, 11:40 AM
 Thanks Jack.
 
 I will look at the option of implementing work around.
 
 --Saida Reddy.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Wild-card-support-when-stemmers-are-added-tp4038402p4038511.html
 Sent from the Lucene - Java Developer mailing list archive
 at Nabble.com.
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: VOTE: RC0 Release apache-solr-ref-guide-4.6.pdf

2013-11-28 Thread Ahmet Arslan



Hi,

On page 293 : rm -r shard*/solr/zoo_data should be rm -r node*/solr/zoo_data
On page 297 : ... shard, an d forwards ... should be ... shard, and forwards 
...

Thanks,
Ahmet





On Wednesday, November 27, 2013 2:47 PM, Cassandra Targett 
casstarg...@gmail.com wrote:

I noticed a couple of small typos and inconsistencies that I've fixed,
but I don't think they warrant a respin. They're more for appearance
than for any factual problems.

+1

Sorry for the delay from me - I've been traveling for holidays.

On Tue, Nov 26, 2013 at 4:22 AM, Jan Høydahl jan@cominvent.com wrote:
 * Page 5: Screenshots with 4.0.0-beta texts
 * Page 165: Links to 4.0.0 version of JavaDoc (now fixed in Confluence)
 * Page 204: Table - group.func - Supported only in Sol4r 4.0. (should be 
 Supported since Solr 4.0.) (now fixed in Confluence)
 * Page 308: Strange xml code box layout, why all the whitespace?

 But these are minors, so here's my +1

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com

 25. nov. 2013 kl. 19:34 skrev Chris Hostetter hossman_luc...@fucit.org:


 Please VOTE to release the following as apache-solr-ref-guide-4.6.pdf ...

 https://dist.apache.org/repos/dist/dev/lucene/solr/ref-guide/apache-solr-ref-guide-4.6-RC0/

 $ cat apache-solr-ref-guide-4.6.pdf.sha1
 7ad494c5a3cdc085e01a54d507ae33a75cc319e6  apache-solr-ref-guide-4.6.pdf




 -Hoss

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Can't seem to build in order to run unit tests from IntelliJ any more.

2013-02-17 Thread Ahmet Arslan

Hi Erick,

Same here,  I get this error:

import org.apache.lucene.queries.CommonTermsQuery; Cannot resolve 
CommonTermsQuery

--- On Sun, 2/17/13, Erick Erickson erickerick...@gmail.com wrote:

From: Erick Erickson erickerick...@gmail.com
Subject: Can't seem to build in order to run unit tests from IntelliJ any more.
To: dev@lucene.apache.org
Date: Sunday, February 17, 2013, 7:22 PM

Anyone else having problems here? I've deleted the ivy cache, cleaned the idea 
project (and everything else), tried it on a fresh checkout. What am I missing?
The problem is that classes are not found, I see messages in IntelliJ like: 

java: package org.apache.lucene.analysis does not exist
java: cannot find symbol  symbol:   class Query  location: package 
org.apache.lucene.search

etc.
I can run things from the command line just fine Did something move?
Thanks,Erick

Re: Can't seem to build in order to run unit tests from IntelliJ any more.

2013-02-18 Thread Ahmet Arslan

Hi Steve,

Thanks for the fix, I can run TestCases using intelliJ now.

Ahmet
--- On Mon, 2/18/13, Steve Rowe sar...@gmail.com wrote:

 From: Steve Rowe sar...@gmail.com
 Subject: Re: Can't seem to build in order to run unit tests from IntelliJ any 
 more.
 To: dev@lucene.apache.org
 Date: Monday, February 18, 2013, 10:04 AM
 Hi Arslan,

 I just committed a fix for this particular problem (missing
 queries module dependency from the highlighter module).

 I think Erick is having a different problem, not sure what
 yet.

 Steve

 On Feb 18, 2013, at 2:41 AM, Ahmet Arslan iori...@yahoo.com
 wrote:

  Hi Erick,

  Same here,  I get this error:

  import org.apache.lucene.queries.CommonTermsQuery;
 Cannot resolve CommonTermsQuery

  --- On Sun, 2/17/13, Erick Erickson erickerick...@gmail.com
 wrote:

  From: Erick Erickson erickerick...@gmail.com
  Subject: Can't seem to build in order to run unit tests
 from IntelliJ any more.
  To: dev@lucene.apache.org
  Date: Sunday, February 17, 2013, 7:22 PM

  Anyone else having problems here? I've deleted the ivy
 cache, cleaned the idea project (and everything else), tried
 it on a fresh checkout. What am I missing?

  The problem is that classes are not found, I see
 messages in IntelliJ like: 

  java: package org.apache.lucene.analysis does not
 exist

  java: cannot find symbol
    symbol:   class Query
    location: package
 org.apache.lucene.search

  etc.

  I can run things from the command line just fine
 Did something move?

  Thanks,
  Erick

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Subscribe to the mailing list

2014-03-10 Thread Ahmet Arslan

Hi Abhishek,

Since you send this e-mail, you are successfully subscribed to dev list.

Please see the how to contribute wiki pages 

http://wiki.apache.org/lucene-java/HowToContribute
http://wiki.apache.org/solr/HowToContribute

Welcome and happy hacking,
Ahmet



On Monday, March 10, 2014 1:56 PM, Abhishek Shah igeniuss...@gmail.com wrote:

Hi I wanted to subscribe to this mailing list and wanted to contribute in 
development of lucene.

-- 
Regards,
Abhishek Shah

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Reducing the number of warnings in the codebase

2014-03-16 Thread Ahmet Arslan

Hi Shawn, 

+1 for the idea, we should take full advantage of Eclipse, IntelliJ etc.

Here are some relevant tickets created by Furkan.

https://issues.apache.org/jira/browse/LUCENE-5506

https://issues.apache.org/jira/browse/SOLR-5838

https://issues.apache.org/jira/browse/SOLR-5839



I believe https://issues.apache.org/jira/browse/SOLR-5685 could expressed as an 
automatic rule or something.
There is already similar thing to detect usage of String.uppercase/lowercase 
without locale. 
And StringBuffer versus StringBuilder.


Ahmet


On Sunday, March 16, 2014 12:09 PM, Furkan KAMACI furkankam...@gmail.com 
wrote:

Hi;

I've run FindBugs for Lucene/Solr project. If you use Intellij IDEA you can 
group the warnings according to their importance. I've opened issues and 
attached patches for top level warnings/errors (and some others) that FindBugs 
found.

On the other hand I have another suggestion for Lucene/Solr project. When I 
develop or lead projects I use Sonar. It's so good and it runs really nice open 
source projects to analyze your code. FindBugs, PMD, Jacoco are just some of 
them. It also calculates the method complexities, LoC and etc. You can see a 
live example from here: https://sonar.springsource.org/dashboard/index/4824

I can be volunteer to integrate Sonar into Lucene/Solr project.

Thanks;
Furkan KAMACI



2014-03-16 11:01 GMT+02:00 Shawn Heisey s...@elyograg.org:

With the default settings in Eclipse, the Lucene/Solr codebase shows
over 6000 warnings.  This is the case for both branch_4x and trunk.  I'm
no expert, but this does seem a little excessive.  If I were to take on
the task of reducing this number, what advice can the group give me?  Is
there someone in particular that I should consider a resource for
inevitable dumb questions?

I haven't done an exhaustive survey, but I would imagine that most of
them can be eliminated fairly easily.  I'm fully aware that we may not
be able to eliminate them all.

One problem with fixing warnings is that the resulting patch(es) would
be just as invasive as the recent work to move branch_4x to Java 7.
This would complicate any ongoing work, especially large-scale work that
is happening onchange-specific branches.

A similar topic that may require a separate discussion: FindBugs.

Thanks,
Shawn


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Reducing the number of warnings in the codebase

2014-03-16 Thread Ahmet Arslan

Hi,

Here are some rules :

Following String methods where left hand side is empty.
  String.replace()
  String.toUpperCase()
  String.toLowerCase()

  String.replaceFirst()
  String.trim()

In test cases (subblasses of SolrTestCaseJ4) methods without assertU(). see : 
SOLR-5685 
  adoc()
  optimize()
  commit()

String.toUpperCase() and String.toLowerCase() without Locale. see : SOLR-2281 
and LUCENE-2466


Can ant precommit/forbidden-apis be used to detect above?

Ahmet



On Sunday, March 16, 2014 9:53 PM, Benson Margulies bimargul...@gmail.com 
wrote:
Just because some tool expresses distaste, doesn't imply that everyone
here agrees that it's a problem we should fix.

In my experience, the default Sonar rulesets contain many things that
people here are prone to disagree with. Start with serialVersionUID:
do we care? Why would we care? In what cases to we really believe that
a sane person would be using Java serialization with a Lucene/Solr
class?

Sonar can also be a bit cranky; it arranges for various tools to run
via mechanisms that sometimes conflict with the ways you might run
them yourself.

So I'd suggest a process like:

1. Someone proposes a set of (e.g.) checkstyle rules to live by.
2. That ruleset is refined by experiment.
3. We make violations fail the build.

Then lather, rinse, repeat for other tools.

Once we have rulesets we agree are worth enforcing, we can look to
Sonar for a pretty way to visualize their results if we like.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Reducing the number of warnings in the codebase

2014-03-16 Thread Ahmet Arslan

Hi Uwe,

I looked for definitions under lucene/tools/forbiddenApis/*.txt files but I 
couldn't find.
Where are those rule are defined? I am wondering about the syntax, can you 
point?

Thanks,
Ahmet



On Sunday, March 16, 2014 10:40 PM, Uwe Schindler u...@thetaphi.de wrote:
 String.toUpperCase() and String.toLowerCase() without Locale. see : SOLR-
 2281 and LUCENE-2466

Those are already detected by forbidden-apis.

 Can ant precommit/forbidden-apis be used to detect above?
 
 Ahmet
 
 
 
 On Sunday, March 16, 2014 9:53 PM, Benson Margulies
 bimargul...@gmail.com wrote:
 Just because some tool expresses distaste, doesn't imply that everyone here
 agrees that it's a problem we should fix.
 
 In my experience, the default Sonar rulesets contain many things that people
 here are prone to disagree with. Start with serialVersionUID:
 do we care? Why would we care? In what cases to we really believe that a
 sane person would be using Java serialization with a Lucene/Solr class?
 
 Sonar can also be a bit cranky; it arranges for various tools to run via
 mechanisms that sometimes conflict with the ways you might run them
 yourself.
 
 So I'd suggest a process like:
 
 1. Someone proposes a set of (e.g.) checkstyle rules to live by.
 2. That ruleset is refined by experiment.
 3. We make violations fail the build.
 
 Then lather, rinse, repeat for other tools.
 
 Once we have rulesets we agree are worth enforcing, we can look to Sonar
 for a pretty way to visualize their results if we like.
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org

 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

TruncateTokenFilter FixedPrefixStemFilter

2014-03-27 Thread Ahmet Arslan

Hello,

I would like to ask if there is an interest to add TruncateTokenFilter to 
lucene.

I am using this filter as a stemmer for Turkish language. In many academic 
research (clustering, classification,retrieval) it is used and called as Fixed 
Prefix Stemmer or Simple Truncation Method or F5 in short.

Among F3 TO F7, F5 stemmer (length=5) is found to work well for Turkish 
language in this [1]. It is the same work where some of stopwords_tr.txt are 
acquired. 

[1] Information Retrieval on Turkish Texts
http://www.users.muohio.edu/canf/papers/JASIST2008offPrint.pdf

ElasticSearch has this filter but it does not respect keyword attribute. 

Main advantage of F5 stemming is it does not effected by the meaning loss 
caused by ascii folding. It work well with ascii folding. 
[2] Effects of diacritics on Turkish information retrieval 
http://journals.tubitak.gov.tr/elektrik/issues/elk-12-20-5/elk-20-5-9-1010-819.pdf

Here is the full type I use for customers 

 fieldType name=text_tr_ascii_f5 class=solr.TextField 
positionIncrementGap=100
   analyzer
     tokenizer class=solr.StandardTokenizerFactory/
     filter class=solr.ApostropheFilterFactory/
     filter class=solr.TurkishLowerCaseFilterFactory/
     filter class=solr.ASCIIFoldingFilterFactory/
     filter class=solr.KeywordRepeatFilterFactory/
     filter class=solr.TruncateTokenFilterFactory prefixLength=5/
     filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer

I  would like to get community opinions on :

1) interest in this? Should I create a jira issue and attach what I have got
2) keyword attribute should be respected? 
3) package name analysis.misc versus analyis.tr 
4) name of the class TruncateTokenFilter versus FixedPrefixStemFilter

Thanks,
Ahmet

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Lucene / Solr 4.7.1 RC2

2014-03-29 Thread Ahmet Arslan

+1
SUCCESS! [1:28:08.851424]


Ahmet


On Saturday, March 29, 2014 10:46 AM, Steve Rowe sar...@gmail.com wrote:
Please vote for the second Release Candidate for Lucene/Solr 4.7.1.

Download it here:
https://people.apache.org/~sarowe/staging_area/lucene-solr-4.7.1-RC2-rev1582953/

Smoke tester cmdline (from the lucene_solr_4_7 branch):

python3.2 -u dev-tools/scripts/smokeTestRelease.py \
https://people.apache.org/~sarowe/staging_area/lucene-solr-4.7.1-RC2-rev1582953/
 \
1582953 4.7.1 /tmp/4.7.1-smoke

The smoke tester passed for me: SUCCESS! [0:50:29.936732]

My vote: +1

Steve
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

ant test-help does not describe tests.disableHdfs

2014-04-06 Thread Ahmet Arslan

Hello all,

Shouldn't 'ant test-help' talk about -Dtests.disableHdfs=true 
-Dtests.badapples=false parameters?

Ahmet

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Lucene / Solr 4.7.2 (take two)

2014-04-11 Thread Ahmet Arslan

+1

SUCCESS! [1:56:54.132500]

On Friday, April 11, 2014 10:22 PM, Adrien Grand jpou...@gmail.com wrote:

+1

SUCCESS! [1:10:37.098259]

On Fri, Apr 11, 2014 at 9:05 PM, Anshum Gupta ans...@anshumgupta.net wrote:
 +1

 SUCCESS! [0:57:06.986265]



 On Thu, Apr 10, 2014 at 7:51 AM, Robert Muir rcm...@gmail.com wrote:

 artifacts are here:

 http://people.apache.org/~rmuir/staging_area/lucene_solr_4_7_2_r1586229/

 here is my +1
 SUCCESS! [0:46:25.014499]

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 --

 Anshum Gupta
 http://www.anshumgupta.net



-- 
Adrien


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

why sha1?

2014-04-16 Thread Ahmet Arslan

Hello,

http://www.us.apache.org/dist/lucene/solr/4.7.2/ uses .sha1 suffix and it looks 
like uses SHA1.

According to http://www.apache.org/dev/release-signing.html#basic-facts 
* An SHA checksum should also be created and must be suffixed .sha

* SHA-1 should be avoided


Is this something overlooked?

Ahmet

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Lucene/Solr 4.8.0 RC1

2014-04-22 Thread Ahmet Arslan



+1

SUCCESS! [1:50:28.179297]

Ahmet

On Tuesday, April 22, 2014 11:38 PM, Uwe Schindler u...@thetaphi.de wrote:

Here is my own +1: SUCCESS! [2:09:05.595608]

Solr tests passed once I raised file descriptor limit. So we should definitely 
fix this. I will try to reproduce in an isolated way and post stack traces.

Uwe


 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Tuesday, April 22, 2014 8:47 PM
 To: dev@lucene.apache.org
 Subject: [VOTE] Lucene/Solr 4.8.0 RC1
 
 Hi,
 
 I prepared the first release candidate of Lucene and Solr 4.8.0. The artifacts
 can be found here:
 
 = http://people.apache.org/~uschindler/staging_area/lucene-solr-4.8.0-
 RC1-rev1589150/
 
 It took a bit longer, because we had to fix some remaining bugs regarding
 NativeFSLockFactory, which did not work correctly and leaked file handles. I
 also updated the instructions about the preferred Java update versions. See
 also Mike's blog post: http://www.elasticsearch.org/blog/java-1-7u55-safe-
 use-elasticsearch-lucene/
 
 Please check the artifacts and give your vote in the next 72 hrs.
 
 My +1 will hopefully come a little bit later because Solr tests are failing
 constantly on my release build and smoke tester machine. The reason: it
 seems to be lack of file handles. A standard Ubuntu configuration has 1024
 file handles and I want a release to pass with that common default
 configuration. Instead,
 org.apache.solr.cloud.TestMiniSolrCloudCluster.testBasics fails always with
 crazy error messages (not about too less file handles, more that Jetty cannot
 start up or not bind ports or various other stuff). This did not happen on
 smoking 4.7.x releases.
 
 I will run now the smoker again without HDFS (via build.properties) and if
 that also fails then once again with more file handles. But we really have to
 fix our tests that they succeed with the default config of 1024 file handles.
 We can configure that in Jenkins (so the Jenkins job first sets and then runs
 ANT ulimit -n 1024). But this should not block the release, I just say: I 
 gave
 up running those Solr tests, sorry! Anybody else can test that stuff!
 
 Uwe
 
 P.S.: Here's my smoker command line:
 $  JAVA_HOME=$HOME/jdk1.7.0_55 JAVA7_HOME=$HOME/jdk1.7.0_55
 python3.2 -u smokeTestRelease.py '
 http://people.apache.org/~uschindler/staging_area/lucene-solr-4.8.0-RC1-
 rev1589150/' 1589150 4.8.0 tmp
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Lucene/Solr 4.8.0 RC2

2014-04-25 Thread Ahmet Arslan

+1

SUCCESS! [1:46:42.182320]

Ahmet


On Friday, April 25, 2014 9:55 AM, Uwe Schindler u...@thetaphi.de wrote:
Here is my +1:
SUCCESS! [1:53:39.982427]

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Thursday, April 24, 2014 11:54 PM
 To: dev@lucene.apache.org
 Subject: [VOTE] Lucene/Solr 4.8.0 RC2
 
 Hi,
 
 I prepared a second release candidate of Lucene and Solr 4.8.0. The artifacts
 can be found here:
 = http://people.apache.org/~uschindler/staging_area/lucene-solr-4.8.0-
 RC2-rev1589874/
 
 This RC contains the additional fixes for SOLR-6011, LUCENE-5626, and
 LUCENE-5630.
 
 Please check the artifacts and give your vote in the next 72 hrs.
 
 Uwe
 
 P.S.: Here's my smoker command line:
 $  JAVA_HOME=$HOME/jdk1.7.0_55 JAVA7_HOME=$HOME/jdk1.7.0_55
 python3.2 -u smokeTestRelease.py '
 http://people.apache.org/~uschindler/staging_area/lucene-solr-4.8.0-RC2-
 rev1589874/' 1589874 4.8.0 tmp
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: maxThreads in Jetty

2014-04-26 Thread Ahmet Arslan

bq. Someone wrote a nice response to this recently - Shawn I think?


Its Uwe : http://search-lucene.com/m/WwzTb2XJeVJ1



On Saturday, April 26, 2014 6:59 PM, Mark Miller markrmil...@gmail.com wrote:

Someone wrote a nice response to this recently - Shawn I think?

The gist is, there are no current plans to move to Netty in 5.

So far, 5 simply makes the http layer an implementation detail of Solr and 
stops promising a WAR.
-- 
Mark Miller
about.me/markrmiller

On April 26, 2014 at 11:28:38 AM, Toke Eskildsen (t...@statsbiblioteket.dk) 
wrote:
Otis Gospodnetic [otis.gospodne...@gmail.com] wrote: 
 I think moving away from Jetty and going to Netty or something like that is 
 on the radar for 5, no? 

That is my understanding, but the thread-issue is just as relevant for a 
non-Web-container setup: It is quite hard to allocate resources if there is no 
real limit on burst rate. 

- Toke Eskildsen 

- 
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org 
For additional commands, e-mail: dev-h...@lucene.apache.org 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Lucene/Solr release 4.8.1

2014-05-16 Thread Ahmet Arslan




+1 

SUCCESS! [1:49:01.551615]

Ahmet

P.S. Here's my smoker command line: python3 
dev-tools/scripts/smokeTestRelease.py 
'http://people.apache.org/~rmuir/staging_area/lucene_solr_4_8_1_r1594670/' 
1594670 4.8.1 tmp


On Friday, May 16, 2014 11:46 PM, Simon Willnauer simon.willna...@gmail.com 
wrote:
+1

SUCCESS! [1:19:33.540237]

ES seems to be happy as well

simon

On Fri, May 16, 2014 at 10:59 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 +1: SUCCESS! [0:39:08.550817]

 Mike McCandless

 http://blog.mikemccandless.com


 On Wed, May 14, 2014 at 8:58 PM, Robert Muir rcm...@gmail.com wrote:
 Hello,

 I have created a release candidate at
 http://people.apache.org/~rmuir/staging_area/lucene_solr_4_8_1_r1594670

 Please test and vote.

 Here is my +1 vote. I smoketested and tried to break things over the
 past week during the mail outage.

 SUCCESS! [0:35:43.543536]

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

add consumeAllTokens option to OffsetLimitTokenFilter

2014-05-31 Thread Ahmet Arslan

Hi,


LimitTokenCountFilter and LimitTokenCountFilter have consumeAllTokens option. 
Any interest to add it to OffsetLimitTokenFilter too?


It looks like there is a bug (SOLR-5426) in OffsetLimitTokenFilter that causes 
: end() called before incrementToken() returned false!
 
Ahmet

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Querying all docs

2014-06-03 Thread Ahmet Arslan

Hi,

Could SOLR-5463 be used here?

Ahmet

On Tuesday, June 3, 2014 2:52 PM, Per Steffensen st...@designware.dk wrote:

Thanks for responding

On 03/06/14 10:32, Mikhail Khludnev wrote:

On Tue, Jun 3, 2014 at 11:12 AM, Per Steffensen st...@designware.dk wrote:

It is not desirable to set rows-param to e.g. MAX_VALUE, because I believe
Solr will allocate memory dependent on the value of rows-param.
not really. it reasonably limits it by maxdocs()
https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L475

Yes I see. But I am not sure when reader.maxDocs is not just the number of
docs available - we have way more than Integer.MAX_VALUE documents.
* SegmentReader.maxDocs: si.info.getDocCount()
* BaseCompositeReader.maxDocs: for (int i = 0; i
subReaders.length; i++) { maxDoc += subReaders[i].maxDoc(); }

The query I want to get all docs from, might hit 1k, 10k, 100k,
1m, ... , but never even close to Integer.MAX_VALUE. And I really do
not like setting rows to something big enough, because I sure the
next day someone tries to extract big enough+1 documents :-). I am
sure no one will ever try to extract Integer.MAX_VALUE so that would
be ok for big enough, but that just seems to use an unreasonable
amount of memory.

Solr and Lucene does not really suits for such all docs, which usually don't
need scores and ranking, but Lucene always intended to allocate results heap
for ranking.
G, yes

Deep paging, might help, but it's not the most achievable performance.
see https://issues.apache.org/jira/browse/SOLR-5244 for some discussion, and
prototype

Thanks! I will definitely vote for that one.
The thing I am working on here is actually some kind of export.

--
Sincerely yours
Mikhail Khludnev
Principal Engineer,

Grid Dynamics

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: 4.9

2014-06-12 Thread Ahmet Arslan

Hi Robert,

OffsetLimitTokenFilter(used internally by highlighter) has the following lines 
: 

public boolean incrementToken() throws IOException {

-if (offsetCount  offsetLimit  input.incrementToken()) {
+if (input.incrementToken()  offsetCount  offsetLimit) {


Arun Kumar figured this out. Can you confirm this is truly a bug?

His above solution, fixes all three : SOLR-3193 SOLR-3901 SOLR-5426.

Thanks,
Ahmet

On Friday, June 13, 2014 4:56 AM, Robert Muir rcm...@gmail.com wrote:



We have a pretty big release already with lots of good performance 
improvements. I'd like to release 4.9 soon, ill be RM. I'm thinking of spinning 
a RC in a week or so.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] 4.9.0

2014-06-20 Thread Ahmet Arslan

Hi,

here is what I do

* download solr-4.9.0.tgz
* add icu4j-53.1.jar and solr-analysis-extras-4.9.0.jar and 
lucene-analyzers-icu-4.9.0.jar to solr-4.9.0/example/solr/collection1/lib/

* confirm they are loaded

INFO  org.apache.solr.core.SolrResourceLoader  – Adding 
'file:/Volumes/datadisk/Desktop/solr-4.9.0/example/solr/collection1/lib/icu4j-53.1.jar'
 to class loader

INFO  org.apache.solr.core.SolrResourceLoader  – Adding 
'file:/Volumes/datadisk/Desktop/solr-4.9.0/example/solr/collection1/lib/lucene-analyzers-icu-4.9.0.jar'
 to classloader

INFO  org.apache.solr.core.SolrResourceLoader  – Adding 
'file:/Volumes/datadisk/Desktop/solr-4.9.0/example/solr/collection1/lib/solr-analysis-extras-4.9.0.jar'
 to class loader

icu4j-53.1.jar loaded twice 

INFO  org.apache.solr.core.SolrResourceLoader  – Adding 
'file:/Volumes/datadisk/Desktop/solr-4.9.0/contrib/extraction/lib/icu4j-53.1.jar'
 to classloader


* add filter class=solr.ICUFoldingFilterFactory/ to example schema.xml

* java -jar start.jar yields the exception reported in SOLR-6188


When filter class=org.apache.lucene.analysis.icu.ICUFoldingFilterFactory/ 
is used everything works fine.

Thanks,
Ahmet



On Friday, June 20, 2014 3:55 PM, Michael McCandless 
luc...@mikemccandless.com wrote:
+1

SUCCESS! [0:47:26.115239]

Mike McCandless

http://blog.mikemccandless.com


On Fri, Jun 20, 2014 at 8:13 AM, Robert Muir rcm...@gmail.com wrote:
 Artifacts here:
 http://people.apache.org/~rmuir/staging_area/lucene_solr_4_9_0_r1604085/

 Here's my +1

 SUCCESS! [0:35:36.654925]

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] 4.9.0

2014-06-22 Thread Ahmet Arslan

Hi,

+1

SUCCESS! [1:47:26.786519]

python3 dev-tools/scripts/smokeTestRelease.py 
'http://people.apache.org/~rmuir/staging_area/lucene_solr_4_9_0_r1604085/' 
1604085 4.9.0 tmp

Ahmet

On Sunday, June 22, 2014 2:11 AM, Walter Underwood wun...@wunderwood.org 
wrote:



Also, isn't JDK 7u51 a known bad release for Lucene? 

wunder


On Jun 21, 2014, at 12:32 PM, Robert Muir rcm...@gmail.com wrote:

Not *the* smoketester, instead some outdated arbitrary random
smoketester from the past.

please, use the latest one from the 4.9 branch.

This file is supposed to be there and the smoketester actually looks for it.

On Sat, Jun 21, 2014 at 3:16 PM, david.w.smi...@gmail.com
david.w.smi...@gmail.com wrote:

The smoke tester failed for me:

lucene-solr_4x_svn$ python3.3 -u dev-tools/scripts/smokeTestRelease.py
http://people.apache.org/~rmuir/staging_area/lucene_solr_4_9_0_r1604085/
1604085 4.9.0 /Volumes/RamDisk/tmp

JAVA7_HOME is
/Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home

NOTE: output encoding is UTF-8


Load release URL
http://people.apache.org/~rmuir/staging_area/lucene_solr_4_9_0_r1604085/;...


Test Lucene...

 test basics...

 get KEYS

   0.1 MB in 0.69 sec (0.2 MB/sec)

 check changes HTML...

 download lucene-4.9.0-src.tgz...

   27.6 MB in 94.12 sec (0.3 MB/sec)

   verify md5/sha1 digests

   verify sig

   verify trust

 GPG: gpg: WARNING: This key is not certified with a trusted signature!

 download lucene-4.9.0.tgz...

   61.7 MB in 226.09 sec (0.3 MB/sec)

   verify md5/sha1 digests

   verify sig

   verify trust

 GPG: gpg: WARNING: This key is not certified with a trusted signature!

 download lucene-4.9.0.zip...

   71.3 MB in 217.32 sec (0.3 MB/sec)

   verify md5/sha1 digests

   verify sig

   verify trust

 GPG: gpg: WARNING: This key is not certified with a trusted signature!

 unpack lucene-4.9.0.tgz...

   verify JAR metadata/identity/no javax.* or java.* classes...

   test demo with 1.7...

 got 5727 hits for query lucene

   check Lucene's javadoc JAR

 unpack lucene-4.9.0.zip...

   verify JAR metadata/identity/no javax.* or java.* classes...

   test demo with 1.7...

 got 5727 hits for query lucene

   check Lucene's javadoc JAR

 unpack lucene-4.9.0-src.tgz...

Traceback (most recent call last):

 File dev-tools/scripts/smokeTestRelease.py, line 1347, in module

 File dev-tools/scripts/smokeTestRelease.py, line 1291, in main

 File dev-tools/scripts/smokeTestRelease.py, line 1329, in smokeTest

 File dev-tools/scripts/smokeTestRelease.py, line 637, in unpackAndVerify

 File dev-tools/scripts/smokeTestRelease.py, line 708, in verifyUnpacked

RuntimeError: lucene: unexpected files/dirs in artifact
lucene-4.9.0-src.tgz: ['ivy-ignore-conflicts.properties']


And indeed, that file is there.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



--
Walter Underwood
wun...@wunderwood.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Solr and Maven

2014-07-04 Thread Ahmet Arslan

Hi Tom,

you might find https://issues.apache.org/jira/browse/LUCENE-5755 relevant.

Ahmet

On Saturday, July 5, 2014 1:26 AM, Tom Chen tomchen1...@gmail.com wrote:



Hi,

The default tool to build Solr is ant ( plus ivy), while Maven support is 
provided. 

Regarding building with Maven,  some questions:

1) Is there any difference between the build created by ant and that created by 
Maven?
2) Any plan for Solr to use Maven as the default building tool?

Regards,
Tom

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Solr EventListerner where to add the implementing classes

2015-01-13 Thread Ahmet Arslan

Hi Meena,

OK, try this,

delete/nuke all lib directives in solrconfigxml
create a directory named lib under solrhome. where solr.xml lives in.
this is supposed to be the most robust way of loading plugins.

And this question better suited to user mailing list.

Ahmet


On Tuesday, January 13, 2015 1:20 AM, meena.sri...@mathworks.com 
meena.sri...@mathworks.com wrote:
Thanks for your reply. I tried adding plugin and referenced to them in
solr-config.xml file with no luck.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-EventListerner-where-to-add-the-implementing-classes-tp4178172p4179076.html

Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Solr EventListerner where to add the implementing classes

2015-01-08 Thread Ahmet Arslan

Hi Meena,

They are just like other plugins, please see how to load plugins section : 
https://wiki.apache.org/solr/SolrPlugins

Ahmet


On Thursday, January 8, 2015 9:16 PM, meena.sri...@mathworks.com 
meena.sri...@mathworks.com wrote:
I am planning to implement the solr(4.9) EventListener interface to listen to
the indexing event using DIH.

document onImportStart
=com.mathworks.brdb.indexer.StartIndexingEventListener
onImportEnd=com.mathworks.brdb.indexer.EndIndexingEventListener


I am not sure where to add these classes StartIndexingEventListener and
EndIndexingEventListener so that solr could find them and do the
necessary. Tried searching, but could not find a solution.

Thanks
Meena





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-EventListerner-where-to-add-the-implementing-classes-tp4178172.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: numberOfDocuments in SimilarityBase

2015-08-01 Thread Ahmet Arslan

Hi Robert,

Thanks for chiming in, I created LUCENE-6711 for this.

Ahmet


On Thursday, July 30, 2015 4:47 PM, Robert Muir rcm...@gmail.com wrote:
I think so. When adding this statistic (lucene 4.0), personally I
really wanted to fix it everywhere. But we had the problem of
backwards compatibility, and its bad to use different formulas for
different segments even if it works...

Nowadays we dont have lucene 3 segments around anymore, so I think we
should fix this. Want to open an issue?

On Wed, Jul 29, 2015 at 10:45 AM, Ahmet Arslan
iori...@yahoo.com.invalid wrote:
 Hello List,

 SimilarityBase uses CollectionStatistics#maxDoc() for numberOfDocuments.
 Shouldn't it be field-based CollectionStatistics#docCount()?

 --- core/src/java/org/apache/lucene/search/similarities/SimilarityBase.java   
   (revision 1693268)
 +++ core/src/java/org/apache/lucene/search/similarities/SimilarityBase.java   
   (working copy)
 @@ -102,7 +102,7 @@
 protected void fillBasicStats(BasicStats stats, CollectionStatistics 
 collectionStats, TermStatistics termStats) {
 // #positions(field) must be = #positions(term)
 assert collectionStats.sumTotalTermFreq() == -1 || 
 collectionStats.sumTotalTermFreq() = termStats.totalTermFreq();
 -long numberOfDocuments = collectionStats.maxDoc();
 +long numberOfDocuments = collectionStats.docCount();


 Thanks,
 Ahmet

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

numberOfDocuments in SimilarityBase

2015-07-29 Thread Ahmet Arslan

Hello List,

SimilarityBase uses CollectionStatistics#maxDoc() for numberOfDocuments.
Shouldn't it be field-based CollectionStatistics#docCount()?

--- core/src/java/org/apache/lucene/search/similarities/SimilarityBase.java 
(revision 1693268)
+++ core/src/java/org/apache/lucene/search/similarities/SimilarityBase.java 
(working copy)
@@ -102,7 +102,7 @@
protected void fillBasicStats(BasicStats stats, CollectionStatistics 
collectionStats, TermStatistics termStats) {
// #positions(field) must be = #positions(term)
assert collectionStats.sumTotalTermFreq() == -1 || 
collectionStats.sumTotalTermFreq() = termStats.totalTermFreq();
-long numberOfDocuments = collectionStats.maxDoc();
+long numberOfDocuments = collectionStats.docCount();


Thanks,
Ahmet

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Writing custom Tokenizer

2015-09-27 Thread Ahmet Arslan

Hi Sid,

Can you provide us more details?

Usually you can get away without a custom tokenizer, there may be other tricks 
to achieve your requirements.

Ahmet



On Sunday, September 27, 2015 11:29 PM, Siddhartha Singh Sandhu 
 wrote:



Hi Everyone,

I wanted to write a custom tokenizer and wanted a generic direction and some 
guidance on how I should go about achieving this goal.

Your input will be much appreciated.

Regards,

Sid.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Writing custom Tokenizer

2015-09-27 Thread Ahmet Arslan

Hi Sid,

One way is to use WhiteSpaceTokenizer and WordDelimeterFilter.

In some cases you might want to adjust how WordDelimiterFilter splits on a 
per-character basis. To do this, you can supply a configuration file with the 
"types" attribute that specifies custom character categories. An example file 
is in subversion here. This is especially useful to add "hashtag or currency" 
searches.

Please see: 
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
https://issues.apache.org/jira/browse/SOLR-2059

@ => ALPHA
# => ALPHA

P.S. Maintaining a custom tonizer will be a burden. It is done with *.jflex 
files blended with java files.
Please see ClassicTokenizerImpl.jflex in the source tree for an example.

Ahmet

On Monday, September 28, 2015 1:58 AM, Siddhartha Singh Sandhu 
<sandhus...@gmail.com> wrote:

Hi Ahmet,

I want primarily 3 things. 

1. To include # and @ as part of the string which is tokenized by the standard 
tokenizer which generally strips it off.
2. When a string is tokenized,I just want to keep tokens which are #tags and 
@mentions.
3. I understand there is PatternTokenizer but I wanted to leverage twitter-text 
github to because I trust there regex more then my own.

Not only the above three, but I also need to control the special characters 
that are striped from my string while tokenizing.

Please let me know of your views.

Regards,

Sid.

On Sun, Sep 27, 2015 at 5:21 PM, Ahmet Arslan <iori...@yahoo.com.invalid> wrote:

Hi Sid,
>
>Can you provide us more details?
>
>Usually you can get away without a custom tokenizer, there may be other tricks 
>to achieve your requirements.
>
>Ahmet
>
>
>
>
>On Sunday, September 27, 2015 11:29 PM, Siddhartha Singh Sandhu 
><sandhus...@gmail.com> wrote:
>
>
>
>Hi Everyone,
>
>I wanted to write a custom tokenizer and wanted a generic direction and some 
>guidance on how I should go about achieving this goal.
>
>Your input will be much appreciated.
>
>Regards,
>
>Sid.
>
>-
>To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: discountOverlaps option for QueryParser

2015-09-20 Thread Ahmet Arslan

Hi Robert,

As I understand, with SynonymQuery, all expansion is recommended to be 
performed on query time only,
and SynonymQuery will take care of the below problem :

"A query for text:TV will expand into (text:TV text:Television) and the lower 
docFreq for text:Television will give the documents that match "Television" a 
much higher score then docs that match "TV" comparably -- which may be somewhat 
counter intuitive to the client. Index time expansion (or reduction) will 
result in the same idf for all documents regardless of which term the original 
text contained."

At the end of the query analysis, if there are tokens at the same position, I 
need to create my SynonymQuery programmatically, right?

Let me explain my concern with another example:

With above analyzer, the query "foo bör" will boost the term "bör" for no 
reason.
Just because bör will be expanded into two terms : bor and bör.
Its contribution to total score is counted two times. I think this is very 
trappy.

With SynonymQuery solution, I will index with StandardTokenizer only.
No expansion at index time.
I will construct the query : new TermQuery('foo') + new SynonymQuery('bor', 
'bör');

Thanks,
Ahmet

On Monday, September 21, 2015 12:33 AM, Robert Muir <rcm...@gmail.com> wrote:
Hi Ahmet, maybe have a look at the SynonymQuery added in
https://issues.apache.org/jira/browse/LUCENE-6789

For query-time synonyms, it just tries to approximate what happens if
you instead do this work at index-time, by creating a "pseudo-term"
(disjunction of all terms at that same position) summing up the term
frequency across all matching terms before passing to score(). For the
statistics side it takes the maximum DF as the representative DF, and
the sum of the TTF as the representative TTF.

I did relevance experiments with this and the results were positive
over the existing query generated (BooleanQuery with coord disabled),
especially for scoring systems that don't do anything with coord.

On Sun, Sep 20, 2015 at 1:56 PM, Ahmet Arslan <iori...@yahoo.com.invalid> wrote:
> Hello,
>
> Assume that term t1 is expanded into multiple terms (at the same position) 
> during both indexing and query time.
> This is possible with KeywordRepeat, SynonymFilter, or the Filters that have 
> preserveOriginal option for instance.
>
> When a two-term query (t1 t2) is executed, term t1 is boosted artificially.
> Score contribution of the term t1 is counted multiple times.
> It is like the query were issued with boosts : t1^3 t2
> This behaviour boosts expanded terms and may not be always desired.
> E.g. (When t2 is a content-bearing word)
>
> I think there should be a flag/switch which is analogous to relationship 
> between discountOverlaps & document's length.
> With this control, overlapping query terms' (tokens with a position of 
> increment of zero) scores are counted once.
> Remaining terms (not overlapping ones) are not affected.
>
> Bruno asked for this functionality in the past : 
> http://find.searchhub.org/document/bb99e435ba35f2b1
>
> What do you think about this? How difficult to implement this?
> Would this be a Lucene or Solr issue?
>
> Thanks,
> Ahmet
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org

>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: discountOverlaps option for QueryParser

2015-09-20 Thread Ahmet Arslan

Hi Dough,

Boosting exact matches is not my primary concern.
By the way, ideal way to aggregate scores coming from different fields remains 
unclear.
May be geometric mean is better than summing the field scores?

I just want to warn people, if filters that produce multiple tokens at the same 
position are used carelessly, it can cause some un-obvious boostings in a 
query. 

Thanks,
Ahmet

On Monday, September 21, 2015 2:38 AM, Doug Turnbull 
<dturnb...@opensourceconnections.com> wrote:

Another option Ahmet would be to create two fields, one that didn't do ASCII 
folding *without* preserving the original and another that did.  The ASCII 
folded version is a less exacting representation of the text, and the version 
without ASCII folding would be more exacting

My first pass at a solution to your problem would summing the two fields 
scores. Scoring the ASCII folded field provides a higher recall signal. I'll 
call this the "base score." Scoring the non-ASCII folded provides a more 
precise ranking signal. It kicks in only when the searcher types the exact non 
ASCII folded term in. In a sense it acts like how most people think of a boost: 
bonus points for harder to meet but valuable criteria. 

In other words, if you match on just bor, you just get the base score. If you 
match on bör you'd gain the benefit of the base and the additional boost 
scores. The more exacting, non ASCII folded version of the field acts as a 
boost.

On the other hand, if you don't care to differentiate between a match on an 
ASCII folded or non-folded version, than simply create the base ASCII folded 
field and score against that.

Shameless plug, this is exactly the sort of thing we talk quite a bit about in 
John Berryman's and my book, Relevant Search (http://manning.com/turnbull). You 
might find it useful.

Cheers
-Doug

On Sunday, September 20, 2015, Ahmet Arslan <iori...@yahoo.com.invalid> wrote:

Hi Robert,
>
>As I understand, with SynonymQuery, all expansion is recommended to be 
>performed on query time only,
>and SynonymQuery will take care of the below problem :
>
>"A query for text:TV will expand into (text:TV text:Television) and the lower 
>docFreq for text:Television will give the documents that match "Television" a 
>much higher score then docs that match "TV" comparably -- which may be 
>somewhat counter intuitive to the client. Index time expansion (or reduction) 
>will result in the same idf for all documents regardless of which term the 
>original text contained."
>
>
>At the end of the query analysis, if there are tokens at the same position, I 
>need to create my SynonymQuery programmatically, right?
>
>
>Let me explain my concern with another example:
>
>
>
>
>
>
>
>With above analyzer, the query "foo bör" will boost the term "bör" for no 
>reason.
>Just because bör will be expanded into two terms : bor and bör.
>Its contribution to total score is counted two times. I think this is very 
>trappy.
>
>With SynonymQuery solution, I will index with StandardTokenizer only.
>No expansion at index time.
>I will construct the query : new TermQuery('foo') + new SynonymQuery('bor', 
>'bör');
>
>Thanks,
>Ahmet
>
>
>
>
>On Monday, September 21, 2015 12:33 AM, Robert Muir <rcm...@gmail.com> wrote:
>Hi Ahmet, maybe have a look at the SynonymQuery added in
>https://issues.apache.org/jira/browse/LUCENE-6789
>
>For query-time synonyms, it just tries to approximate what happens if
>you instead do this work at index-time, by creating a "pseudo-term"
>(disjunction of all terms at that same position) summing up the term
>frequency across all matching terms before passing to score(). For the
>statistics side it takes the maximum DF as the representative DF, and
>the sum of the TTF as the representative TTF.
>
>I did relevance experiments with this and the results were positive
>over the existing query generated (BooleanQuery with coord disabled),
>especially for scoring systems that don't do anything with coord.
>
>
>On Sun, Sep 20, 2015 at 1:56 PM, Ahmet Arslan <iori...@yahoo.com.invalid> 
>wrote:
>> Hello,
>>
>> Assume that term t1 is expanded into multiple terms (at the same position) 
>> during both indexing and query time.
>> This is possible with KeywordRepeat, SynonymFilter, or the Filters that have 
>> preserveOriginal option for instance.
>>
>> When a two-term query (t1 t2) is executed, term t1 is boosted artificially.
>> Score contribution of the term t1 is counted multiple times.
>> It is like the query were issued with boosts : t1^3 t2
>> This behaviour boosts expanded terms and may not be always desired.
>> E.g. (When t2 is a content-bearing word)
>>
>>

checkJavadocLinks.py fails with Python 3.5.0

2015-09-23 Thread Ahmet Arslan

Hi,

In effort to run "ant precommit" I have installed Python 3.5.0.
However, it fails with the following :

[exec]   File 
"/Volumes/data/workspace/solr-trunk/dev-tools/scripts/checkJavadocLinks.py", 
line 20, in 
[exec] from html.parser import HTMLParser, HTMLParseError
[exec] ImportError: cannot import name 'HTMLParseError'


Python 3.5.0 (v3.5.0:374f501f4567, Sep 12 2015, 11:00:19) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin

I tried to solve this by myself, found something like :
"HTMLParserError have been removed from python3.5" 

Any suggestions given that i am python ignorant?

Thanks,
Ahmet

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

discountOverlaps option for QueryParser

2015-09-20 Thread Ahmet Arslan

Hello,

Assume that term t1 is expanded into multiple terms (at the same position) 
during both indexing and query time.
This is possible with KeywordRepeat, SynonymFilter, or the Filters that have 
preserveOriginal option for instance.

When a two-term query (t1 t2) is executed, term t1 is boosted artificially.
Score contribution of the term t1 is counted multiple times.
It is like the query were issued with boosts : t1^3 t2
This behaviour boosts expanded terms and may not be always desired.
E.g. (When t2 is a content-bearing word)

I think there should be a flag/switch which is analogous to relationship 
between discountOverlaps & document's length.
With this control, overlapping query terms' (tokens with a position of 
increment of zero) scores are counted once.
Remaining terms (not overlapping ones) are not affected.

Bruno asked for this functionality in the past : 
http://find.searchhub.org/document/bb99e435ba35f2b1

What do you think about this? How difficult to implement this?
Would this be a Lucene or Solr issue?

Thanks,
Ahmet

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: checkJavadocLinks.py fails with Python 3.5.0

2015-09-23 Thread Ahmet Arslan

Thanks Mike, its working now.

Ahmet



On Wednesday, September 23, 2015 10:10 PM, Michael McCandless 
<luc...@mikemccandless.com> wrote:
Looks like you can't be strict when parsing HTML anymore in Python3.5:
http://bugs.python.org/issue15114

I'll fix checkJavadocLinks...

Mike McCandless

http://blog.mikemccandless.com


On Wed, Sep 23, 2015 at 2:58 PM, Alan Woodward <a...@flax.co.uk> wrote:
> I hit this a couple of weeks back, when homebrew automatically upgraded me
> to python 3.5.  I have a separate python 3.2 installation, and added this
> line to ~/build.properties:
>
> python32.exe=/path/to/python3.2
>
> Alan Woodward
> www.flax.co.uk
>
>
> On 23 Sep 2015, at 18:06, Ahmet Arslan wrote:
>
> Hi,
>
> In effort to run "ant precommit" I have installed Python 3.5.0.
> However, it fails with the following :
>
> [exec]   File
> "/Volumes/data/workspace/solr-trunk/dev-tools/scripts/checkJavadocLinks.py",
> line 20, in 
> [exec] from html.parser import HTMLParser, HTMLParseError
> [exec] ImportError: cannot import name 'HTMLParseError'
>
>
> Python 3.5.0 (v3.5.0:374f501f4567, Sep 12 2015, 11:00:19)
> [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
>
> I tried to solve this by myself, found something like :
> "HTMLParserError have been removed from python3.5"
>
> Any suggestions given that i am python ignorant?
>
> Thanks,
> Ahmet
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org

>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 5.4.0-RC1

2015-12-08 Thread Ahmet Arslan

Hi,

python3 -u dev-tools/scripts/smokeTestRelease.py 
https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.4.0-RC1-rev1718046 
gives me following exception:

RuntimeError: JAR file 
"/private/tmp/smoke_lucene_5.4.0_1718046_1/unpack/lucene-5.4.0/analysis/common/lucene-analyzers-common-5.4.0.jar"
 is missing "X-Compile-Source-JDK: 1.8" inside its META-INF/MANIFEST.MF

I am doing something wrong?

Thanks,
Ahmet



On Tuesday, December 8, 2015 3:15 AM, "david.w.smi...@gmail.com" 
 wrote:



+1 for release. (tested with Java 7)
SUCCESS! [0:56:31.943245]


On Mon, Dec 7, 2015 at 8:05 PM Steve Rowe  wrote:

+1
>
>Docs, javadocs, and changes look good.
>
>Smoke tester was happy with Java7 and Java8:
>
>SUCCESS! [1:53:58.550314]
>
>Steve
>
>> On Dec 7, 2015, at 5:31 AM, Upayavira  wrote:
>>
>> Yes, Shalin, you are right. My fix was still required, but I clearly
>> manually entered the SVN commit command wrong. Seeing as it does not
>> impact upon the contents of the files, I have executed an SVN mv
>> command, rerun the smoke test with the below, which worked:
>>
>> python3 -u dev-tools/scripts/smokeTestRelease.py
>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.4.0-RC1-rev1718046
>>
>> Please, folks, use the above to run the smoke test for this release.
>>
>> Upayavira
>>
>> On Mon, Dec 7, 2015, at 04:00 AM, Shalin Shekhar Mangar wrote:
>>> Hi Upayavira,
>>>
>>> The svn revision in the URL is wrong. It should be 1718046 but it is
>>> 178046 which makes the smoke tester fail with the following message:
>>>
>>> RuntimeError: JAR file
>>> "/tmp/smoke_lucene_5.4.0_178046_1/unpack/lucene-5.4.0/analysis/common/lucene-analyzers-common-5.4.0.jar"
>>> is missing "Implementation-Version: 5.4.0 178046 " inside its
>>> META-INF/MANIFEST.MF (wrong svn revision?)
>>>
>>> I think you may need to generate a new RC. But perhaps an svn move to
>>> a path with the right revision number may also suffice?
>>>
>>> On Mon, Dec 7, 2015 at 9:12 AM, Shalin Shekhar Mangar
>>>  wrote:
 Thanks Upayavira. I guess Apache has started redirecting http traffic
 to https recently on dist.apache.org which must have broken this
 script. I am able to run smoke tester after applying your patch.

 On Mon, Dec 7, 2015 at 2:08 AM, Upayavira  wrote:
> The getHREFs() method is taking in an HTTPS URL, but failing to preserve
> the protocol, resulting in an HTTP call that the server naturally
> bounces to HTTPS. Unfortunately, the next loop round also forgets the
> HTTPS, and hence we're stuck in an endless loop. Below is a patch that
> fixes this issue. I'd rather someone with more knowledge of this script
> confirm my suspicion and apply the patch for us all to use, as I cannot
> see how this ever worked.
>
> I personally ran the smoke test on my local copy, so did not hit this
> HTTP/HTTPS code. I'm running the HTTP version now, and will check on it
> in the morning.
>
> Index: dev-tools/scripts/smokeTestRelease.py
> ===
> --- dev-tools/scripts/smokeTestRelease.py   (revision 1718046)
> +++ dev-tools/scripts/smokeTestRelease.py   (working copy)
> @@ -84,7 +84,12 @@
>   # Deref any redirects
>   while True:
> url = urllib.parse.urlparse(urlString)
> -h = http.client.HTTPConnection(url.netloc)
> +if url.scheme == "http":
> +  h = http.client.HTTPConnection(url.netloc)
> +elif url.scheme == "https":
> +  h = http.client.HTTPSConnection(url.netloc)
> +else:
> +  raise RuntimeError("Unknown protocol: %s" % url.scheme)
> h.request('GET', url.path)
> r = h.getresponse()
> newLoc = r.getheader('location')
>
> Upayavira
>
> On Sun, Dec 6, 2015, at 06:26 PM, Noble Paul wrote:
>> Same here.
>>
>> On Sun, Dec 6, 2015 at 2:36 PM, Shalin Shekhar Mangar
>>  wrote:
>>> Is anyone able to run the smoke tester on this RC? It just hangs for a
>>> long time on "loading release URL" for me.
>>>
>>> python3 -u dev-tools/scripts/smokeTestRelease.py --tmp-dir
>>> ../smoke-5.4 --revision 178046 --version 5.4.0 --test-java8
>>> ~/programs/jdk8
>>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.4.0-RC1-rev178046/
>>> Java 1.7 JAVA_HOME=/home/shalin/programs/jdk7
>>> Java 1.8 JAVA_HOME=/home/shalin/programs/jdk8
>>> NOTE: output encoding is UTF-8
>>>
>>> Load release URL
>>> "https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.4.0-RC1-rev178046/;...
>>>
>>> I did a strace and found that the server is returning a HTTP 301 moved
>>> permanently response to the http request.
>>>
>>> On Sat, Dec 5, 2015 at 4:28 PM, Upayavira  wrote:
 Please

Re: [VOTE] Release Lucene/Solr 5.3.2-RC2

2016-01-20 Thread Ahmet Arslan

+1
SUCCESS! [1:38:55.940645]




On Tuesday, January 19, 2016 10:25 PM, Yonik Seeley  wrote:
+1

-Yonik



On Mon, Jan 18, 2016 at 11:23 AM, Anshum Gupta  wrote:
> Please vote for the RC2 release candidate for Lucene/Solr 5.3.2
>
> The artifacts can be downloaded from:
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.3.2-RC2-rev1725196
>
> You can run the smoke tester directly with this command:
> python3 -u dev-tools/scripts/smokeTestRelease.py
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.3.2-RC2-rev1725196
>
> Here's my +1
>
> SUCCESS! [0:26:22.094521]
>
> --
> Anshum Gupta

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 5.4.1 RC2

2016-01-20 Thread Ahmet Arslan


+1 
SUCCESS! [1:50:21.498224]

On Wednesday, January 20, 2016 1:28 AM, Tomás Fernández Löbbe 
 wrote:



+1 
SUCCESS! [1:27:55.987215]



On Tue, Jan 19, 2016 at 12:25 PM, Yonik Seeley  wrote:

+1
>
>-Yonik
>
>
>On Mon, Jan 18, 2016 at 9:38 AM, Adrien Grand  wrote:
>> Please vote for the RC2 release candidate for Lucene/Solr 5.4.1
>>
>> This release candidate contains 3 additional changes compared to the RC1:
>>  - SOLR-8496: multi-select faceting and getDocSet(List) can match
>> deleted docs
>>  - SOLR-8418: Adapt to changes in LUCENE-6590 for use of boosts with
>> MLTHandler and Simple/CloudMLTQParser
>>  - SOLR-8561: Add fallback to ZkController.getLeaderProps for a mixed
>> 5.4-pre-5.4 deployments
>>
>> The artifacts can be downloaded from:
>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.4.1-RC2-rev1725212
>>
>> You can run the smoke tester directly with this command:
>> python3 -u dev-tools/scripts/smokeTestRelease.py
>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.4.1-RC2-rev1725212
>>
>> The smoke tester already passed for me both with the local and remote
>> artifacts, so here is my +1.
>
>
>-
>To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 6.0.0 RC2

2016-04-04 Thread Ahmet Arslan



+1
SUCCESS! [1:42:49.802039]
Ahmet


On Tuesday, April 5, 2016 1:09 AM, Anshum Gupta  wrote:



Thanks for taking this up Nick!

Here's my +1:

SUCCESS! [0:38:14.023246]



On Fri, Apr 1, 2016 at 1:44 PM, Nicholas Knize  wrote:

Please vote for the RC2 release candidate for Lucene/Solr 6.0.0.
>
>Artifacts:
>
>
>
>  
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-6.0.0-RC2-rev48c80f91b8e5cd9b3a9b48e6184bd53e7619e7e3
>
>
>Smoke tester:
>
>
>  python3 -u dev-tools/scripts/smokeTestRelease.py 
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-6.0.0-RC2-rev48c80f91b8e5cd9b3a9b48e6184bd53e7619e7e3
>
>
>
>Here's my +1:
>
>
>SUCCESS! [0:28:59.770357]
>
>
>
>- Nick Knize


-- 

Anshum Gupta

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome Karl Wright as a Lucene/Solr committer!

2016-04-04 Thread Ahmet Arslan



Welcome Karl!



On Monday, April 4, 2016 6:54 PM, Robert Muir  wrote:
Welcome Karl!


On Mon, Apr 4, 2016 at 10:40 AM, Karl Wright  wrote:
> Hi all,
>
> Professionally, I've been active in software development since the 1970's.
> My interests include many things related to software development, as well as
> areas as varied as geology, carpentry, and gardening.  I'm the PMC chair for
> the ManifoldCF project, as well as a committer on other Apache projects such
> as Http Components.
>
> My current employer is HERE, Inc, who is a spin-off from Nokia, who sells
> map data, services, and search capabilities.
>
> I'm also the contributor and principal author of the Geo3D package, which is
> now part of Lucene under the spatial3d module.  I intend to continue to
> contribute to this package for the foreseeable future.
>
> Thanks!!
> Karl
>
>
> On Mon, Apr 4, 2016 at 10:28 AM, Michael McCandless
>  wrote:
>>
>> I'm pleased to announce that Karl Wright has accepted the Lucene PMC's
>> invitation to become a committer.
>>
>> Karl, it's tradition that you introduce yourself with a brief bio.
>>
>> Karma has been granted to your pre-existing account, so that you can
>> add yourself to the committers section of the Who We Are page on the
>> website: http://lucene.apache.org/whoweare.html
>>
>> Congratulations and welcome!
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Using rows=-1 for "give me all"

2016-05-23 Thread Ahmet Arslan

Hi Steffensen,

Not sure about rows=-1, but retrieval engines are optimized to return top-N 
results.
However, there exists special commands for "give me all"
https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets
Ahmet 



On Monday, May 23, 2016 11:38 PM, Per Steffensen  wrote:
Hi

Back when we used 4.4.0 I believe a query with rows=-1 returned all 
matching documents. In 5.1.0 (the one we are using now) rows=-1 will 
trigger a validation exception. If I remove the code that throws that 
exception, it seems like rows=-1 behaves like rows=0. Has the support 
for rows=-1 (give me all) been reintroduced in a release after 5.1.0? If 
yes, which JIRA-ticket? If no, any plans to reintroduce it? Any good 
reason for changing the rows=-1 behavior? Am I the only one that liked 
it? :-)

Regards, Per Steffensen

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 6.0.1 RC2

2016-05-25 Thread Ahmet Arslan



+1
SUCCESS! [1:00:26.085469]


On Wednesday, May 25, 2016 11:27 AM, Tommaso Teofili 
 wrote:



got the same warning on the GPG key signature but could not reproduce David's 
issue, not sure what it could be though. I'd say if no one else can reproduce 
it let's go ahead with the release.

+1 on my side.

SUCCESS! [1:19:14.997834]
Regards,
Tommaso

Il giorno mer 25 mag 2016 alle ore 06:48 David Smiley 
 ha scritto:

I tried to run the smoke tester directly on my machine and it failed right 
after unpacking.  Given other's success, it must be user error.  What might the 
problem be?
>
>
>  unpack lucene-6.0.1.tgz...
>verify JAR metadata/identity/no javax.* or java.* classes...
>Traceback (most recent call last):
>  File "dev-tools/scripts/smokeTestRelease.py", line 1412, in 
>main()
>  File "dev-tools/scripts/smokeTestRelease.py", line 1356, in main
>smokeTest(c.java, c.url, c.revision, c.version, c.tmp_dir, c.is_signed, ' 
> '.join(c.test_args))
>  File "dev-tools/scripts/smokeTestRelease.py", line 1393, in smokeTest
>unpackAndVerify(java, 'lucene', tmpDir, artifact, gitRevision, version, 
> testArgs, baseURL)
>  File "dev-tools/scripts/smokeTestRelease.py", line 590, in unpackAndVerify
>verifyUnpacked(java, project, artifact, unpackPath, gitRevision, version, 
> testArgs, tmpDir, baseURL)
>  File "dev-tools/scripts/smokeTestRelease.py", line 712, in verifyUnpacked
>checkAllJARs(os.getcwd(), project, gitRevision, version, tmpDir, baseURL)
>  File "dev-tools/scripts/smokeTestRelease.py", line 270, in checkAllJARs
>checkJARMetaData('JAR file "%s"' % fullPath, fullPath, gitRevision, 
> version)
>  File "dev-tools/scripts/smokeTestRelease.py", line 202, in checkJARMetaData
>(desc, verify))
>RuntimeError: JAR file 
>"/private/tmp/smoke_lucene_6.0.1_c7510a0fdd93329ec04c853c8557f4a3f2309eaf/unpack/lucene-6.0.1/analysis/common/lucene-analyzers-common-6.0.1.jar"
> is missing "X-Compile-Source-JDK: 8" inside its META-INF/MANIFEST.MF
>
>
>Separately from the smoketest, I've downloaded this RC to use it on a new 
>project and haven't found issues yet.
>
>On Tue, May 24, 2016 at 1:19 PM Anshum Gupta  wrote:
>
>Thanks for doing the release, Steve. All looks good to me but I think you 
>should get someone to sign you GPG key :)
>>
>>
>>
>>I see this warning while running the tests: GPG: gpg: WARNING: This key is 
>>not certified with a trusted signature!
>>
>>
>>Here's my +1! 
>>
>>
>>SUCCESS! [1:05:50.755245]
>>
>>
>>
>>
>>
>>
>>
>>On Tue, May 24, 2016 at 5:24 AM, Michael McCandless 
>> wrote:
>>
>>+1
>>>
>>>
>>>SUCCESS! [0:31:57.451386]
>>>
>>>
>>>
>>>Mike McCandless
>>>
>>>http://blog.mikemccandless.com
>>>
>>>
>>>On Tue, May 24, 2016 at 12:13 AM, Steve Rowe  wrote:
>>>
>>>Please vote for release candidate 2 for Lucene/Solr 6.0.1.  (I found a 
>>>couple problems in CHANGES after I committed RC1 to Subversion, so I didn’t 
>>>call the vote, and cut RC2 instead.)

The artifacts can be downloaded from:

https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-6.0.1-RC2-revc7510a0fdd93329ec04c853c8557f4a3f2309eaf

You can run the smoke tester directly with this command:

python3 -u dev-tools/scripts/smokeTestRelease.py \
https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-6.0.1-RC2-revc7510a0fdd93329ec04c853c8557f4a3f2309eaf

Here’s my +1.

Docs, changes and javadocs look good.

SUCCESS! [0:26:34.596490]

--
Steve
www.lucidworks.com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


>>>
>>
>>
>>
>>
>>-- 
>>
>>Anshum Gupta
>-- 
>
>Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>LinkedIn: http://linkedin.com/in/davidwsmiley | Book: 
>http://www.solrenterprisesearchserver.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

TestUAX29URLEmailTokenizer inconsistent adding dots and apostrophes to URLs and Emails

2017-08-12 Thread Ahmet Arslan

Hi,
I extracted Emails and URLs from certain TREC collections using 
TestUAX29URLEmailTokenizer combined with TypeTokenFilter.
High Freq. terms reveal that  * some e-mail addressed start with apostrophes  * 
some e-mails or URLs end with a period. 
I ran a few tests and this behaviour occurs only if the entity is the first or 
last term in the text.If the entity is the middle of the text, UAXURLET strips 
apostrophes and dots.
For example, "Contact me at java-u...@lucene.apache.org. or 
dev@lucene.apache.org." will produce java-u...@lucene.apache.org.  
dev@lucene.apache.orgNotice first email has a dot, while second has not.
Why UAXURLET behaves different for the first/last token? Could this be a bug?
It looks like dot and apostrophes  are legal parts of the entities but with this
abbreviations such as W.Va. D-W.Va. v.ye. are recognized as URL.

I created 8 test cases to get your opinions for this one, before creating a 
Jira issue.
 public void testURLEndingWithDot2() throws IOException {
BaseTokenStreamTestCase.assertAnalyzesTo(a, "My Web addresses are 
www.apache.org. and lucene.apache.org",
new String[] {"My","Web","addresses", 
"are","www.apache.org","and","lucene.apache.org"},
new String[] 
{"","","","","","",""});
  }

public void testEMailStartingWithApostrophe2() throws IOException {
BaseTokenStreamTestCase.assertAnalyzesTo(a, "'g...@usgs.gov 
'cber_i...@a1.cber.fda.gov.",
new String[] {"g...@usgs.gov","cber_i...@a1.cber.fda.gov"},
new String[] {"","","",""});
  }


P.S. I observed somehow similar phenomena with ICU tokenizer. ICU tokenizer 
sets script attribute to Latin for words that consist of numbers.
But if the whole text is composed of words that consist of numbers, script 
attribute is set to Common.
Thanks,Ahmet

lucene 6.6.0 download link redirects to 6.5.1

2017-08-19 Thread Ahmet Arslan

Hi,

Lucene download page redirects to 
http://www-eu.apache.org/dist/lucene/java/6.5.1  for me.Solr's link is correct.
Ahmet

Re: Welcome Ahmet Arslan as Lucene/Solr committer

2017-12-18 Thread Ahmet Arslan

 Hi,
Thanks to all for the warm welcome.
It is such an honor to be invited by the PMC.

I am an Assistant Professor in the Department of Computer Engineering at 
Anadolu University, Turkey.
My current research interests include selective information retrieval and index 
term weighting.
I started using Lucene during my master studies for academic purposes.Later on, 
I have worked in a number of commercial search projects using Apache 
Lucene/Solr.
I am very proud of being part of this team!
Thanks,
Ahmet
On Monday, December 18, 2017, 4:42:34 PM GMT+3, Steve Rowe 
<sar...@gmail.com> wrote:  

 Congrats and welcome Ahmet!

--
Steve
www.lucidworks.com

> On Dec 17, 2017, at 5:15 AM, Adrien Grand <jpou...@gmail.com> wrote:
> 
> Hi all,
> 
> Please join me in welcoming Ahmet Arslan as the latest Lucene/Solr committer.
> Ahmet, it's tradition for you to introduce yourself with a brief bio.
> 
> Congratulations and Welcome!
> 
> Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 7.3.1 RC2

2018-05-12 Thread Ahmet Arslan

+1

SUCCESS! [1:19:56.690027]

Ahmet




On Saturday, May 12, 2018, 12:41:16 AM GMT+3, Michael McCandless 
 wrote: 





+1

SUCCESS! [0:40:57.887333]

Mike McCandless

http://blog.mikemccandless.com

On Fri, May 11, 2018 at 1:09 PM, Adrien Grand  wrote:
> +1
> SUCCESS! [1:33:37.370199]
> 
> Le mer. 9 mai 2018 à 16:59, Mark Miller  a écrit :
>> Even before I saw that comment, I was thinking poor Alan...
>> 
>> - Mark
>> 
>> 
>> On Wed, May 9, 2018 at 7:31 AM Alan Woodward  wrote:
>>> +1
>>> SUCCESS! [3:10:43.862442]
>>> 
>>> My internet has been really very slow today...
>>> 
>>> On Wed, May 9, 2018 at 5:50 AM, Đạt Cao Mạnh  
>>> wrote:
 Please vote for release candidate 2 for Lucene/Solr 7.3.1
 
 The artifact can be downloaded from:
 https://dist.apache.org/repos/ dist/dev/lucene/lucene-solr-7. 3.1-RC2- 
 revae0705edb59eaa567fe13ed3a22 2fdadc7153680/
 
 You can run the smoke tester directly with this command:
 python3 -u dev-tools/scripts/smokeTestRel ease.py https://dist.apache. 
 org/repos/dist/dev/lucene/ lucene-solr-7.3.1-RC2- 
 revae0705edb59eaa567fe13ed3a22 2fdadc7153680
 
 Here’s my +1
 SUCCESS! [0:53:47.443795]
 
>>> 
>>> 
>> -- 
>> - Mark 
>> about.me/markrmiller
>> 
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome Nhat Nguyen as Lucene/Solr committer

2018-06-19 Thread Ahmet Arslan



Congratulations and Welcome!




On Tuesday, June 19, 2018, 7:20:48 PM GMT+3, Jason Gerlowski 
 wrote: 





Welcome Nhat!

On Tue, Jun 19, 2018 at 10:10 AM, Varun Thacker  wrote:
> Congratulations and welcome Nhat! 
> 
> On Tue, Jun 19, 2018 at 10:16 AM, Alan Woodward  wrote:
>> Welcome Nhat!
>> 
>> 
>>> On 18 Jun 2018, at 21:41, Adrien Grand  wrote:
>>> 
>>> Hi all,
>>> 
>>> Please join me in welcoming Nhat Nguyen as the latest Lucene/Solr committer.
>>> Nhat, it's tradition for you to introduce yourself with a brief bio.
>>> 
>>> Congratulations and Welcome!
>>> 
>>> Adrien
>>> 
>> 
> 
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 7.3.1 RC1

2018-05-03 Thread Ahmet Arslan

+1

SUCCESS! [1:15:16.705804]

Ahmet


On Wednesday, May 2, 2018, 9:55:04 PM GMT+3, David Smiley 
 wrote: 





+1
SUCCESS! [1:04:51.914445]

On Wed, May 2, 2018 at 12:32 PM Michael McCandless  
wrote:
> +1
> 
> SUCCESS! [0:49:04.927108]
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
>  
> On Wed, May 2, 2018 at 6:40 AM, Đạt Cao Mạnh  wrote:
>> Please vote for release candidate 1 for Lucene/Solr 7.3.1
>> 
>> The artifacts can be downloaded from:
>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-7.3.1-RC1-rev8fa7687413558b3bc65cbbbeb722a21314187e6a
>> 
>> You can run the smoke tester directly with this command:
>> 
>> python3 -u dev-tools/scripts/smokeTestRelease.py \
>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-7.3.1-RC1-rev8fa7687413558b3bc65cbbbeb722a21314187e6a
>> 
>> Here's my +1
>> SUCCESS! [0:52:14.381028]
>> 
> 
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book: 
http://www.solrenterprisesearchserver.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome Dennis Gove to the PMC

2017-12-27 Thread Ahmet Arslan

Congratulations Dennis!

Ahmet




On Wednesday, December 27, 2017, 7:56:58 PM GMT+3, Dawid Weiss 
 wrote: 





Congratulations Dennis!

Dawid

On Wed, Dec 27, 2017 at 5:37 PM, Anshum Gupta  wrote:
> Congratulations and welcome Dennis!
>
> On Wed, Dec 27, 2017 at 4:59 PM Steve Rowe  wrote:
>>
>> Congrats and welcome Dennis!
>>
>> --
>> Steve
>> www.lucidworks.com
>>
>> > On Dec 26, 2017, at 8:12 AM, Joel Bernstein  wrote:
>> >
>> > I am pleased to announce that Dennis Gove has accepted the PMC's
>> > invitation to join.
>> >
>> > Welcome Dennis!
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org

>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 7.2.1 RC1

2018-01-13 Thread Ahmet Arslan

 
+1

SUCCESS! [4:00:41.664562]

Ahmet
On Saturday, January 13, 2018, 7:06:47 PM GMT+3, Kevin Risden 
 wrote:  
 
 Ishan - Try docker run -it openjdk:9-jdk. java was replaced with openjdk. 
java:9-jdk has version 9b149 where as openjdk:9-jdk has version 9.0.1-11. This 
should have been fixed before Java 9 GA. https://github.com/docker- 
library/openjdk/issues/101
Kevin Risden
On Sat, Jan 13, 2018 at 6:09 AM, Ishan Chattopadhyaya 
 wrote:

This also happens with 7.2.0 and 7.1.0. Could be something to do with the 
official Java image. Nothing that stops the RC, I think.

On Sat, Jan 13, 2018 at 5:11 PM, Ishan Chattopadhyaya 
 wrote:

I spun up a docker container with Java 9 (java:9-jdk) from docker hub [0]. 
Downloaded the Solr 7.2.1 RC1 tarball and unzipped it. Tried to start it, but 
it failed citing some crypto issue:
https://gist.github.com/anonym ous/ed1a179b1043190b5f6fd635c6 a47f23

I'm trying out the same for 7.2.0 and earlier versions to see if this is a 
recent regression.


[0] - docker run -it java:9-jdk

On Wed, Jan 10, 2018 at 11:04 PM, Adrien Grand  wrote:

+1

SUCCESS! [1:29:47.999770]

Le mer. 10 janv. 2018 à 18:03, Tomas Fernandez Lobbe  a 
écrit :

+1
SUCCESS! [1:04:34.912689]

On Jan 10, 2018, at 8:01 AM, Alan Woodward  wrote:
+1

SUCCESS! [1:43:16.772919]

I need to get a new test machine...


On 10 Jan 2018, at 09:51, Dawid Weiss  wrote:

+1

SUCCESS! [1:31:30.029815]

Dawid

On Wed, Jan 10, 2018 at 10:46 AM, Shalin Shekhar Mangar
 wrote:

+1

SUCCESS! [1:13:22.042124]

On Wed, Jan 10, 2018 at 8:00 AM, jim ferenczi  wrote:

Please vote for release candidate 1 for Lucene/Solr 7.2.1

The artifacts can be downloaded from:
https://dist.apache.org/repos/ dist/dev/lucene/lucene-solr-7. 
2.1-RC1-revb2b6438b37073bee1fc a40374e85bf91aa457c0b

You can run the smoke tester directly with this command:

python3 -u dev-tools/scripts/smokeTestRel ease.py \
https://dist.apache.org/repos/ dist/dev/lucene/lucene-solr-7. 
2.1-RC1-revb2b6438b37073bee1fc a40374e85bf91aa457c0b

Here's my +1
SUCCESS! [0:38:10.689623]




--
Regards,
Shalin Shekhar Mangar.

-- -- -
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache. org
For additional commands, e-mail: dev-h...@lucene.apache.org



-- -- -
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache. org
For additional commands, e-mail: dev-h...@lucene.apache.org




-- -- -
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache. org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome Ignacio Vera as Lucene/Solr committer

2018-01-11 Thread Ahmet Arslan



Congratulations Ignacio!

Ahmet




On Thursday, January 11, 2018, 9:43:50 PM GMT+3, Martin Gainty 
 wrote: 







¡Bienvendos Ignacio!




Martín 
__  



 
From: Erick Erickson 
Sent: Thursday, January 11, 2018 12:39 PM
To: dev@lucene.apache.org
Subject: Re: Welcome Ignacio Vera as Lucene/Solr committer 
 



Welcome Ignacio!




On Thu, Jan 11, 2018 at 9:09 AM, Karl Wright   wrote:

>  
> Welcome, Ignacio! 
> Karl
> 
> 
>  
>  
> 
> 
> On Thu, Jan 11, 2018 at 11:46 AM, Steve Rowe   wrote:
> 
>>  Congrats and welcome Ignacio!
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>  
>> 
>>> On Jan 11, 2018, at 11:35 AM, Adrien Grand  wrote:
>>>
>>> Hi all,
>>>
>>> Please join me in welcoming Ignacio Vera as the latest Lucene/Solr 
>>> committer.
>>> Ignacio, it's tradition for you to introduce yourself with a brief bio.
>>>
>>> Congratulations and Welcome!
>> 
>> 
>> 
>> 
>> -- -- -
>> To unsubscribe, e-mail:  dev-unsubscribe@lucene.apache. org
>> For additional commands, e-mail:  dev-h...@lucene.apache.org
>> 
>> 
> 
> 
> 
> 
> 
> 








-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome Jason Gerlowski as committer

2018-02-10 Thread Ahmet Arslan



Congratulations and welcome Jason!




On Friday, February 9, 2018, 11:58:06 AM GMT+3, Alan Woodward 
 wrote: 





Welcome Jason!

> On 8 Feb 2018, at 17:02, David Smiley  wrote:
> 
> Hello everyone,
> 
> It's my pleasure to announce that Jason Gerlowski is our latest committer for 
> Lucene/Solr in recognition for his contributions to the project!  Please join 
> me in welcoming him.  Jason, it's tradition for you to introduce yourself 
> with a brief bio.
> 
> Congratulations and Welcome!
> -- 
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: 
> http://www.solrenterprisesearchserver.com
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome Karl Wright to the PMC

2017-12-28 Thread Ahmet Arslan

 Congratulations  Karl!
Ahmet

On Thursday, December 28, 2017, 7:32:41 PM GMT+3, Steve Rowe 
 wrote:  
 
 Congrats and welcome Karl!

--
Steve
www.lucidworks.com

> On Dec 28, 2017, at 9:08 AM, Adrien Grand  wrote:
> 
> I am pleased to announce that Karl Wright has accepted the PMC's invitation 
> to join.
> 
> Welcome Karl!


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome Gus Heck as Lucene/Solr committer

2018-11-04 Thread Ahmet Arslan

 Congratulations!
On Friday, November 2, 2018, 7:13:35 PM GMT+3, Varun Thacker 
 wrote:  
 
 Congratulations and welcome Gus!

On Thu, Nov 1, 2018 at 5:22 AM David Smiley  wrote:

Hi all,
Please join me in welcoming Gus Heck as the latest Lucene/Solr committer! 
Congratulations and Welcome, Gus!
Gus, it's traditional for you to introduce yourself with a brief bio.

~ David-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, SpeakerLinkedIn: 
http://linkedin.com/in/davidwsmiley | Book: 
http://www.solrenterprisesearchserver.com

Re: Welcome Tim Allison as a Lucene/Solr committer

2018-11-04 Thread Ahmet Arslan

 Congratulations !
On Saturday, November 3, 2018, 1:43:31 AM GMT+3, Nhat Nguyen 
 wrote:  
 
 Welcome Tim!
On Fri, Nov 2, 2018 at 6:33 PM Tommaso Teofili  
wrote:

Welcome Tim!!!

Tommaso
Il giorno ven 2 nov 2018 alle ore 22:30 Steve Rowe 
ha scritto:
>
> Welcome Tim!
>
> Steve
>
> On Fri, Nov 2, 2018 at 12:20 PM Erick Erickson  
> wrote:
>>
>> Hi all,
>>
>> Please join me in welcoming Tim Allison as the latest Lucene/Solr committer!
>>
>> Congratulations and Welcome, Tim!
>>
>> It's traditional for you to introduce yourself with a brief bio.
>>
>> Erick
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome Tomoko Uchida as Lucene/Solr committer

2019-04-09 Thread Ahmet Arslan

Congralations Tomoko!




On Tuesday, April 9, 2019, 8:48:03 PM GMT+3, Robert Muir  
wrote: 





Welcome!

On Mon, Apr 8, 2019 at 11:21 AM Uwe Schindler  wrote:
>
> Hi all,
>
> Please join me in welcoming Tomoko Uchida as the latest Lucene/Solr committer!
>
> She has been working on https://issues.apache.org/jira/browse/LUCENE-2562 for 
> several years with awesome progress and finally we got the fantastic Luke as 
> a branch on ASF JIRA: 
> https://gitbox.apache.org/repos/asf?p=lucene-solr.git;a=shortlog;h=refs/heads/jira/lucene-2562-luke-swing-3
> Looking forward to the first release of Apache Lucene 8.1 with Luke bundled 
> in the distribution. I will take care of merging it to master and 8.x 
> branches together with her once she got the ASF account.
>
> Tomoko also helped with the Japanese and Korean Analyzers.
>
> Congratulations and Welcome, Tomoko! Tomoko, it's traditional for you to 
> introduce yourself with a brief bio.
>
> Uwe & Robert (who nominated Tomoko)
>
> -
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1499) SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ

2011-03-15 Thread Ahmet Arslan (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007236#comment-13007236
]

Ahmet Arslan commented on SOLR-1499:

Hi,

Can i use this to upgrade solr version? Where the lucene/solr indices are not
compatible?

Thanks,
Ahmet

SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via
SolrJ
-

Key: SOLR-1499
URL: https://issues.apache.org/jira/browse/SOLR-1499
Project: Solr
Issue Type: New Feature
Components: contrib - DataImportHandler
Reporter: Lance Norskog
Assignee: Erik Hatcher
Fix For: Next

Attachments: SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch,
SOLR-1499.patch, SOLR-1499.patch

The SolrEntityProcessor queries an external Solr instance. The Solr documents
returned are unpacked and emitted as DIH fields.
The SolrEntityProcessor uses the following attributes:
* solr='http://localhost:8983/solr/sms'
** This gives the URL of the target Solr instance.
*** Note: the connection to the target Solr uses the binary SolrJ format.
* query='Jeffersonsort=id+asc'
** This gives the base query string use with Solr. It can include any
standard Solr request parameter. This attribute is processed under the
variable resolution rules and can be driven in an inner stage of the indexing
pipeline.
* rows='10'
** This gives the number of rows to fetch per request..
** The SolrEntityProcessor always fetches every document that matches the
request..
* fields='id,tag'
** This selects the fields to be returned from the Solr request.
** These must also be declared as field elements.
** As with all fields, template processors can be used to alter the contents
to be passed downwards.
* timeout='30'
** This limits the query to 5 seconds. This can be used as a fail-safe to
prevent the indexing session from freezing up. By default the timeout is 5
minutes.
Limitations:
* Solr errors are not handled correctly.
* Loop control constructs have not been tested.
* Multi-valued returned fields have not been tested.
The unit tests give examples of how to use it as the root entity and an inner
entity.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1499) SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ

2011-03-16 Thread Ahmet Arslan (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007516#comment-13007516
]

Ahmet Arslan commented on SOLR-1499:

Hi Lance,

I setup patch to latest trunk. It required some change though.
I pointed out a solr URL (version 1.4.0) to upgrade from 1.4.0 to trunk.

I received :

Caused by: java.lang.RuntimeException: Invalid version (expected 2, but 1) or
the data in not in 'javabin' format
at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99)
at
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:478)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:245)

What can be a work around to overcome this?

SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via
SolrJ
-

Attachments: SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch,
SOLR-1499.patch, SOLR-1499.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1499) SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ

2011-03-16 Thread Ahmet Arslan (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007700#comment-13007700
]

Ahmet Arslan commented on SOLR-1499:

Eric,

Thanks for the pointer. As you said when I use

new CommonsHttpSolrServer(new URL(http://solr1.4.0Instance:8080/solr;), null,
new XMLResponseParser(), false);

I was able to communicate to solr 1.4.0 instance using solrj-trunk.

Do you recommend modifying this patch in this manner? Any performance hits?

Plus, What do you think about copy-pasting JavaBinCodec.java from source
version to destination version and Using a custom BinaryResponseParser that
uses that copy-paste class? Seems working for 1.4.0 to trunk.

Or should i stick with writing a little script to do it?

P.S. I am just trying to use a feature that will be already maintained by solr
commnunity.

SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via
SolrJ
-

Attachments: SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch,
SOLR-1499.patch, SOLR-1499.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-1499) SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ

2011-03-17 Thread Ahmet Arslan (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ahmet Arslan updated SOLR-1499:
---

Attachment: SOLR-1499.patch

Bring upto trunk version 1082579.
Add (format=javabin|xml) parameter. xml is needed for solr upgrade where solr
versions are not compatible. Test cases needs to be updated.

SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via
SolrJ
-

Key: SOLR-1499
URL: https://issues.apache.org/jira/browse/SOLR-1499
Project: Solr
Issue Type: New Feature
Components: contrib - DataImportHandler
Reporter: Lance Norskog
Fix For: Next

Attachments: SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch,
SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1499) SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ

2011-07-04 Thread Ahmet Arslan (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059333#comment-13059333
]

Ahmet Arslan commented on SOLR-1499:

Lance, I used it once to upgrade.

SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via
SolrJ
-

Key: SOLR-1499
URL: https://issues.apache.org/jira/browse/SOLR-1499
Project: Solr
Issue Type: New Feature
Components: contrib - DataImportHandler
Reporter: Lance Norskog
Fix For: 3.4, 4.0

Attachments: SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch,
SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2208) Token div exceeds length of provided text sized 4114

2011-07-05 Thread Ahmet Arslan (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059836#comment-13059836
 ] 

Ahmet Arslan commented on LUCENE-2208:
--

Hello, I am using very recent trunk, I received the same exception 
(InvalidTokenOffsetsException) with PatternReplaceCharFilter. I observed that 
HTMLStripCharFilter sometimes causes wrong words getting highlighted. So I was 
playing with PatternReplaceCharFilter to somehow remove html tags hoping 
highlighting won't be broken this time. 

I remember tokenizer versions of htmlStrip has problems with highlighting. It 
seems that it is continued with charFilters. Hsiu Wang, do you think the reason 
(HTMLStripCharFilter causes wrong words getting highlighted) is the same as 
here (what you explained here)?

 Token div exceeds length of provided text sized 4114
 

 Key: LUCENE-2208
 URL: https://issues.apache.org/jira/browse/LUCENE-2208
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 3.0
 Environment:  diagnostics = {os.version=5.1, os=Windows XP, 
 lucene.version=3.0.0 883080 - 2009-11-22 15:43:58, source=flush, os.arch=x86, 
 java.version=1.6.0_12, java.vendor=Sun Microsystems Inc.}

Reporter: Ramazan VARLIKLI
 Attachments: LUCENE-2208.patch, LUCENE-2208_test.patch


 I have a doc which contains html codes. I want to strip html tags and make 
 the test clear after then apply highlighter on the clear text . But 
 highlighter throws an exceptions if I strip out the html characters  , if i 
 don't strip out , it works fine. It just confuses me at the moment 
 I copy paste 3 thing here from the console as it may contain special 
 characters which might cause the problem.
 1 -) Here is the html text 
   h2Starter/h2
   div id=tab1-content class=tabContent selected
 div class=head/div
 div class=body
  div class=subject-headerLearning path: History/div
   h3Key question/h3
   pDid transport fuel the industrial revolution?/p
   h3Learning Objective/h3
 ul
   liTo categorise points as for or against an argument/li
   /ul
 p
   h3What to do?/h3
   ul
 liWatch the clip: emTransport fuelled the industrial 
 revolution./em/li
   /ul
   pThe clips claims that transport fuelled the industrial 
 revolution. Some historians argue that the industrial revolution only 
 happened because of developments in transport./p
 ul
   liRead the statements below and decide which 
 points are emfor/em and which points are emagainst/em the argument 
 that industry expanded in the 18th and 19th centuries because of developments 
 in transport./li
   /ul
   
   ol type=a
   liIndustry expanded because of inventions and 
 the discovery of steam power./li
   liImprovements in transport allowed goods to 
 be sold all over the country and all over the world so there were more 
 customers to develop industry for./li
   liDevelopments in transport allowed 
 resources, such as coal from mines and cotton from America to come together 
 to manufacture products./li
   liTransport only developed because industry 
 needed it. It was slow to develop as money was spent on improving roads, then 
 building canals and the replacing them with railways in order to keep up with 
 industry./li
   /ol
   
   pNow try to think of 2 more statements of your 
 own./p
   
 /div
 div class=foot/div
   /div
   h2Main activity/h2
   div id=tab2-content class=tabContent
 div class=head/div
 div class=bodydiv class=subject-headerLearning path: 
 History/div
   h3Learning Objective/h3
   ul
 liTo select evidence to support points/li
   /ul
   h3What to do?/h3
   !--ul
 liWatch the clip: emWindmill and water mill/em/li
   /ul--
   ulliChoose the 4 points that you think are most important - 
 try to be balanced by having two strongfor/strong and two 
 strongagainst/strong./li
 liWrite one in each of the point boxes of the 
 paragraphs on the sheet a href=lp_history_industry_transport_ws1.html 
 class=link-internalConstructing a balanced argument/a./li/ul pYou 
 might like to re write the points in your own words and use connectives

[jira] [Updated] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2011-07-21 Thread Ahmet Arslan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-1604:
---

Attachment: ComplexPhrase.zip

Update for solr 3.3.0
* Download apache-solr-3.3.0-src.tgz
* Download most latest ComplexPhrase.zip
* 'mvn package' will generate 3 files under target folder
copy them to apache-solr-3.3.0/solr/lib/
** cp target/ComplexPhrase-* Downloads/apache-solr-3.3.0/solr/lib/
* call 'ant clean dist' to create a new apache-solr-3.3-SNAPSHOT.war file under 
solr/dist folder

 Wildcards, ORs etc inside Phrase Queries
 

 Key: SOLR-1604
 URL: https://issues.apache.org/jira/browse/SOLR-1604
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, 
 SOLR-1604.patch


 Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
 wildcards, ORs, ranges, fuzzies inside phrase queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators

2011-07-21 Thread Ahmet Arslan (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068925#comment-13068925
]

Ahmet Arslan commented on SOLR-2649:

I experienced the same issue. When i added one negative clause to the query
string (that has two optional clauses), mm is ignored and default operator is
used instead.
q=word1 word2 -word3mm=100%defType=edismax
and
q=word1 word2 -word3mm=100%defType=dismax
returns different result sets.

edismax returns documents containing either word1 or word2, although there are
two optional clauses in the query and mm is set to 100%.

MM ignored in edismax queries with operators

Key: SOLR-2649
URL: https://issues.apache.org/jira/browse/SOLR-2649
Project: Solr
Issue Type: Bug
Components: search
Affects Versions: 3.3
Reporter: Magnus Bergmark
Priority: Minor

Hypothetical scenario:
1. User searches for stocks oil gold with MM set to 50%
2. User adds -stockings to the query: stocks oil gold -stockings
3. User gets no hits since MM was ignored and all terms where AND-ed
together
The behavior seems to be intentional, although the reason why is never
explained:
// For correct lucene queries, turn off mm processing if there
// were explicit operators (except for AND).
boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0;
(lines 232-234 taken from
tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
This makes edismax unsuitable as an replacement to dismax; mm is one of the
primary features of dismax.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2010-04-20 Thread Ahmet Arslan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859048#action_12859048
 ] 

Ahmet Arslan commented on SOLR-1604:


To enable ComplexPhraseQueryParser in Solr 1.4.0 - Revisited

Due to the reasons revealed here [1] this plugin should be loaded using the old 
way [2]

[1] http://search-lucene.com/m/E49gN1naPyh
[2] http://wiki.apache.org/solr/SolrPlugins#The_Old_Way

1-) extract ComplexPhrase.zip and run 'mvn package'
2-) unzip apache-solr-1.4.0.zip and copy 
ComplexPhrase/target/ComplexPhrase-1.0.jar to apache-solr-1.4.0/lib directory. 
3-) create a new apache-solr-1.4.0\dist\apache-solr-1.4.1-dev.war (by running 
'ant dist') and use it.

4-) register queryparser to solrhome/conf/solrconfig.xml by adding 
queryParser name=complexphrase 
class=org.apache.solr.search.ComplexPhraseQParserPlugin / 
5-) enable it by appending defType=complexphrase to search url.
6-) Alternatively you can add {!complexphrase} in front of your query string. 
e.g. q={!complexphrase}s* b*
7-) More permanent usage can be configured in solrconfig.xml
requestHandler name=standard class=solr.StandardRequestHandler 
default=true
lst name=defaults
str name=defTypecomplexphrase/str 
/lst
/requestHandler

 Wildcards, ORs etc inside Phrase Queries
 

 Key: SOLR-1604
 URL: https://issues.apache.org/jira/browse/SOLR-1604
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 1.5

 Attachments: ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhraseQueryParser.java, SOLR-1604.patch


 Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
 wildcards, ORs, ranges, fuzzies inside phrase queries.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2012-10-23 Thread Ahmet Arslan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-1604:
---

Attachment: ComplexPhrase.zip

Includes README.txt that contain instruction for Solr 4.0.0

 Wildcards, ORs etc inside Phrase Queries
 

 Key: SOLR-1604
 URL: https://issues.apache.org/jira/browse/SOLR-1604
 Project: Solr
  Issue Type: Improvement
  Components: query parsers, search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Priority: Minor
 Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, 
 ComplexPhraseQueryParser.java, ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
 SOLR-1604-alternative.patch, SOLR-1604.patch, SOLR-1604.patch


 Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
 wildcards, ORs, ranges, fuzzies inside phrase queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3000) Add support for ComplexPhraseQueryParser

2012-01-04 Thread Ahmet Arslan (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179338#comment-13179338
 ] 

Ahmet Arslan commented on SOLR-3000:


I think this is duplicate of 
[SOLR-1604|https://issues.apache.org/jira/browse/SOLR-1604]?

 Add support for ComplexPhraseQueryParser
 

 Key: SOLR-3000
 URL: https://issues.apache.org/jira/browse/SOLR-3000
 Project: Solr
  Issue Type: New Feature
Reporter: Santiago M. Mola

 It would be useful to have support for queries such as: my phrse~0.5 
 queri~0.5~2, as those provided by Lucene's ComplexPhraseQueryParser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3060) add highlighter support to SurroundQParserPlugin

2012-01-25 Thread Ahmet Arslan (Created) (JIRA)

add highlighter support to  SurroundQParserPlugin
-

 Key: SOLR-3060
 URL: https://issues.apache.org/jira/browse/SOLR-3060
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 4.0


Highlighter does not recognize SrndQuery family.
http://search-lucene.com/m/FuDsU1sTjgM
http://search-lucene.com/m/wD8c11gNTb61

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3060) add highlighter support to SurroundQParserPlugin

2012-01-25 Thread Ahmet Arslan (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-3060:
---

Attachment: SOLR-3060.patch

o.a.s.search.QParser#getHighlightQuery() method is overridden.

 add highlighter support to  SurroundQParserPlugin
 -

 Key: SOLR-3060
 URL: https://issues.apache.org/jira/browse/SOLR-3060
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-3060.patch


 Highlighter does not recognize SrndQuery family.
 http://search-lucene.com/m/FuDsU1sTjgM
 http://search-lucene.com/m/wD8c11gNTb61

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3060) add highlighter support to SurroundQParserPlugin

2012-01-25 Thread Ahmet Arslan (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-3060:
---

Attachment: SOLR-3060.patch

some test added

 add highlighter support to  SurroundQParserPlugin
 -

 Key: SOLR-3060
 URL: https://issues.apache.org/jira/browse/SOLR-3060
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-3060.patch, SOLR-3060.patch


 Highlighter does not recognize SrndQuery family.
 http://search-lucene.com/m/FuDsU1sTjgM
 http://search-lucene.com/m/wD8c11gNTb61

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3060) add highlighter support to SurroundQParserPlugin

2012-01-30 Thread Ahmet Arslan (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195979#comment-13195979
 ] 

Ahmet Arslan commented on SOLR-3060:


The following commands should do it.

* svn checkout http://svn.apache.org/repos/asf/lucene/dev/trunk
* cd trunk
* curl -O 
https://issues.apache.org/jira/secure/attachment/12511843/SOLR-3060.patch
* patch -p0 -i SOLR-3060.patch
* cd solr
* ant clean dist

Use newly created trunk/solr/dist/apache-solr-4.0-SNAPSHOT.war file.

 add highlighter support to  SurroundQParserPlugin
 -

 Key: SOLR-3060
 URL: https://issues.apache.org/jira/browse/SOLR-3060
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-3060.patch, SOLR-3060.patch


 Highlighter does not recognize SrndQuery family.
 http://search-lucene.com/m/FuDsU1sTjgM
 http://search-lucene.com/m/wD8c11gNTb61

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3074) Bug in SolrPluginUtilsTest

2012-01-30 Thread Ahmet Arslan (Created) (JIRA)

Bug in SolrPluginUtilsTest
--

 Key: SOLR-3074
 URL: https://issues.apache.org/jira/browse/SOLR-3074
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.0
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 4.0


testDocListConversion() is not testing what it's suppossed to test. Because 
added test documents are not committed.

http://search-lucene.com/m/uwh9l2SHH4e

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3074) Bug in SolrPluginUtilsTest

2012-01-30 Thread Ahmet Arslan (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-3074:
---

Attachment: SOLR-3074.patch

 Bug in SolrPluginUtilsTest
 --

 Key: SOLR-3074
 URL: https://issues.apache.org/jira/browse/SOLR-3074
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.0
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-3074.patch


 testDocListConversion() is not testing what it's suppossed to test. Because 
 added test documents are not committed.
 http://search-lucene.com/m/uwh9l2SHH4e

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2012-02-03 Thread Ahmet Arslan (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199774#comment-13199774
 ] 

Ahmet Arslan commented on SOLR-1604:


Committed o.a.l.queryparser.complexPhrase.ComplexPhraseQueryParser does not 
work with non-default fields. Several Lucene users raised this issue on mailing 
lists. Mark Harwood said the following on LUCENE-1486 which is still 
unresolved. However it didn't get any attention.

{quote}Fixing this would require changing the package name of 
ComplexPhraseQueryParser or changing the visibility of field in the 
QueryParser base class to protected.
Anyone have any strong feelings about which of these is the most 
acceptable?{quote}

That's why attachment of this issue is not consuming committed 
o.a.l.queryparser.complexPhrase.ComplexPhraseQueryParser and released as a 
solr-plugin. 

 Wildcards, ORs etc inside Phrase Queries
 

 Key: SOLR-1604
 URL: https://issues.apache.org/jira/browse/SOLR-1604
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 3.6, 4.0

 Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, 
 SOLR-1604.patch


 Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
 wildcards, ORs, ranges, fuzzies inside phrase queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2012-02-03 Thread Ahmet Arslan (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200020#comment-13200020
 ] 

Ahmet Arslan commented on SOLR-1604:


I imagined that LUCENE-1486 will be closed/fixed in the future, hopefully 
including the non default field patch. Are you saying that non default field 
problem should be handled in a separate issue (other than LUCENE-1486) ?

 Wildcards, ORs etc inside Phrase Queries
 

 Key: SOLR-1604
 URL: https://issues.apache.org/jira/browse/SOLR-1604
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 3.6, 4.0

 Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, 
 SOLR-1604.patch


 Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
 wildcards, ORs, ranges, fuzzies inside phrase queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2012-02-08 Thread Ahmet Arslan (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-1486:
-

Attachment: LUCENE-1486.patch

Mark's and Tomas' non default field patches are combined.

 Wildcards, ORs etc inside Phrase queries
 

 Key: LUCENE-1486
 URL: https://issues.apache.org/jira/browse/LUCENE-1486
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/queryparser
Affects Versions: 2.4
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.0

 Attachments: ComplexPhraseQueryParser.java, LUCENE-1486.patch, 
 LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, 
 LUCENE-1486.patch, LUCENE-1486.patch, Lucene-1486 non default field.patch, 
 TestComplexPhraseQuery.java, junit_complex_phrase_qp_07_21_2009.patch, 
 junit_complex_phrase_qp_07_22_2009.patch


 An extension to the default QueryParser that overrides the parsing of 
 PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
 The implementation feels a little hacky - this is arguably better handled in 
 QueryParser itself. This works as a proof of concept  for much of the query 
 parser syntax. Examples from the Junit test include:
   checkMatches(\j*   smyth~\, 1,2); //wildcards and fuzzies 
 are OK in phrases
   checkMatches(\(jo* -john)  smith\, 2); // boolean logic 
 works
   checkMatches(\jo*  smith\~2, 1,2,3); // position logic 
 works.
   
   checkBadQuery(\jo*  id:1 smith\); //mixing fields in a 
 phrase is bad
   checkBadQuery(\jo* \smith\ \); //phrases inside phrases 
 is bad
   checkBadQuery(\jo* [sma TO smZ]\ \); //range queries 
 inside phrases not supported
 Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2012-02-08 Thread Ahmet Arslan (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203514#comment-13203514
 ] 

Ahmet Arslan commented on LUCENE-1486:
--

Thanks for looking into this, Mark and Tomas. Do you think this issue is the 
right place to introduce boolean inOrder parameter? Currently always 
inOrder=true is passed to SpanNearQuery's ctor.

 Wildcards, ORs etc inside Phrase queries
 

 Key: LUCENE-1486
 URL: https://issues.apache.org/jira/browse/LUCENE-1486
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/queryparser
Affects Versions: 2.4
Reporter: Mark Harwood
Priority: Minor
 Fix For: 4.0

 Attachments: ComplexPhraseQueryParser.java, LUCENE-1486.patch, 
 LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, 
 LUCENE-1486.patch, LUCENE-1486.patch, Lucene-1486 non default field.patch, 
 TestComplexPhraseQuery.java, junit_complex_phrase_qp_07_21_2009.patch, 
 junit_complex_phrase_qp_07_22_2009.patch


 An extension to the default QueryParser that overrides the parsing of 
 PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
 The implementation feels a little hacky - this is arguably better handled in 
 QueryParser itself. This works as a proof of concept  for much of the query 
 parser syntax. Examples from the Junit test include:
   checkMatches(\j*   smyth~\, 1,2); //wildcards and fuzzies 
 are OK in phrases
   checkMatches(\(jo* -john)  smith\, 2); // boolean logic 
 works
   checkMatches(\jo*  smith\~2, 1,2,3); // position logic 
 works.
   
   checkBadQuery(\jo*  id:1 smith\); //mixing fields in a 
 phrase is bad
   checkBadQuery(\jo* \smith\ \); //phrases inside phrases 
 is bad
   checkBadQuery(\jo* [sma TO smZ]\ \); //range queries 
 inside phrases not supported
 Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2012-02-09 Thread Ahmet Arslan (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-1604:
---

Attachment: SOLR-1604.patch

path for trunk

 Wildcards, ORs etc inside Phrase Queries
 

 Key: SOLR-1604
 URL: https://issues.apache.org/jira/browse/SOLR-1604
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 3.6, 4.0

 Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, 
 SOLR-1604.patch, SOLR-1604.patch


 Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
 wildcards, ORs, ranges, fuzzies inside phrase queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3758) Allow the ComplexPhraseQueryParser to search order or un-order proximity queries.

2012-02-09 Thread Ahmet Arslan (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-3758:
-

Attachment: LUCENE-3758.patch

patch for trunk

 Allow the ComplexPhraseQueryParser to search order or un-order proximity 
 queries.
 -

 Key: LUCENE-3758
 URL: https://issues.apache.org/jira/browse/LUCENE-3758
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/queryparser
Affects Versions: 4.0
Reporter: Tomás Fernández Löbbe
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3758.patch


 The ComplexPhraseQueryParser use SpanNearQuery, but always set the inOrder 
 value hardcoded to true. This could be configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3193) highligting on an unindexed field throws InvalidTokenOffsetsException

2012-03-02 Thread Ahmet Arslan (Created) (JIRA)

highligting on an unindexed field throws InvalidTokenOffsetsException
-

 Key: SOLR-3193
 URL: https://issues.apache.org/jira/browse/SOLR-3193
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 3.6
Reporter: Ahmet Arslan
Priority: Minor


When highlighting is requested on an un-indexed field (for the second time), 
InvalidTokenOffsetsException is thrown.

http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-td3560997.html#a3793593

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3193) highligting on an unindexed field throws InvalidTokenOffsetsException

2012-03-02 Thread Ahmet Arslan (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-3193:
---

Attachment: SOLR-3193.patch

TestCase that demonstrates bug.

 highligting on an unindexed field throws InvalidTokenOffsetsException
 -

 Key: SOLR-3193
 URL: https://issues.apache.org/jira/browse/SOLR-3193
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 3.6
Reporter: Ahmet Arslan
Priority: Minor
 Attachments: SOLR-3193.patch


 When highlighting is requested on an un-indexed field (for the second time), 
 InvalidTokenOffsetsException is thrown.
 http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-td3560997.html#a3793593

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3193) highligting on an unindexed field throws InvalidTokenOffsetsException

2012-03-02 Thread Ahmet Arslan (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221083#comment-13221083
 ] 

Ahmet Arslan commented on SOLR-3193:


If solr.ReversedWildcardFilterFactory is removed from index analyzer, then 
attached test passes.

 highligting on an unindexed field throws InvalidTokenOffsetsException
 -

 Key: SOLR-3193
 URL: https://issues.apache.org/jira/browse/SOLR-3193
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 3.6
Reporter: Ahmet Arslan
Priority: Minor
 Attachments: SOLR-3193.patch


 When highlighting is requested on an un-indexed field (for the second time), 
 InvalidTokenOffsetsException is thrown.
 http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-td3560997.html#a3793593

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2010-10-22 Thread Ahmet Arslan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-1604:
---

Attachment: ComplexPhrase.zip

There is a need for un-ordered proximity search. 
http://search-lucene.com/m/3W9fj2yzNy82/
Configurable inOrder parameter is added where default behavior is {color:blue} 
true {color}.

The configuration below can be used to obtain same behavior of 
[PhraseQuery|http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/PhraseQuery.html]
 in which order of terms is not important.

{code:xml} 
queryParser name=complexphrase 
class=org.apache.solr.search.ComplexPhraseQParserPlugin
bool name=inOrderfalse/bool
/queryParser
{code} 

 Wildcards, ORs etc inside Phrase Queries
 

 Key: SOLR-1604
 URL: https://issues.apache.org/jira/browse/SOLR-1604
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: Next

 Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhraseQueryParser.java, SOLR-1604.patch


 Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
 wildcards, ORs, ranges, fuzzies inside phrase queries.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2010-11-03 Thread Ahmet Arslan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-1604:
---

Attachment: ComplexPhrase.zip

Hyphen inside the phrase causes exception. e.g. sulfur-reducing bacteria
Terje Eggestad's 
[fix|https://issues.apache.org/jira/browse/LUCENE-1486?focusedCommentId=12900278page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12900278]
 is added.


 Wildcards, ORs etc inside Phrase Queries
 

 Key: SOLR-1604
 URL: https://issues.apache.org/jira/browse/SOLR-1604
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: Next

 Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhraseQueryParser.java, SOLR-1604.patch


 Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
 wildcards, ORs, ranges, fuzzies inside phrase queries.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2012-08-24 Thread Ahmet Arslan (JIRA)















































Ahmet Arslan
 commented on  SOLR-1604


Wildcards, ORs etc inside Phrase Queries















Could I get the grammar file (.jj file) for ComplexPhrase one - It's not there as part of the patch/zip file.
It does not have a separate grammar file. It just extends QueryPaser.



























This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3759) mistakes about example-DIH

2012-08-26 Thread Ahmet Arslan (JIRA)















































Ahmet Arslan
 created  SOLR-3759


mistakes about example-DIH















Issue Type:


Bug



Assignee:


Unassigned


Components:


contrib - DataImportHandler, documentation



Created:


26/Aug/12 17:23



Description:


mail core's solrconfig.xml lacks lib directive for contrib/extraction/lib.




Project:


Solr



Priority:


Minor



Reporter:


Ahmet Arslan




























This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3759) mistakes about example-DIH

2012-08-26 Thread Ahmet Arslan (JIRA)















































Ahmet Arslan
 updated  SOLR-3759


mistakes about example-DIH
















Change By:


Ahmet Arslan
(26/Aug/12 17:24)




Attachment:


SOLR-3759.patch



























This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3759) mistakes about example-DIH

2012-08-26 Thread Ahmet Arslan (JIRA)















































Ahmet Arslan
 updated  SOLR-3759


mistakes about example-DIH
















missing AdminHandlers for tika core and PingRequestHandler for all cores.





Change By:


Ahmet Arslan
(26/Aug/12 17:49)




Attachment:


SOLR-3759.patch



























This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3759) mistakes about example-DIH

2012-08-30 Thread Ahmet Arslan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-3759:
---

Fix Version/s: 4.0

 mistakes about example-DIH
 --

 Key: SOLR-3759
 URL: https://issues.apache.org/jira/browse/SOLR-3759
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler, documentation
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-3759.patch, SOLR-3759.patch


 mail core's solrconfig.xml lacks lib directive for contrib/extraction/lib.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3779) LineEntityProcessor processes only one document

2012-08-31 Thread Ahmet Arslan (JIRA)

Ahmet Arslan created SOLR-3779:
--

 Summary: LineEntityProcessor processes only one document
 Key: SOLR-3779
 URL: https://issues.apache.org/jira/browse/SOLR-3779
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.0-BETA
Reporter: Ahmet Arslan
 Fix For: 4.0


LineEntityProcessor processes only one document when combined with 
FileListEntityProcessor.

{code:xml}
dataConfig
dataSource type=FileDataSource encoding=UTF-8 name=fds/
document
   entity name=f processor=FileListEntityProcessor fileName=.*txt 
baseDir=/Volumes/data/Documents recursive=false rootEntity=false 
dataSource=null transformer=TemplateTransformer 
 entity onError=skip name=jc   processor=LineEntityProcessor 
url=${f.fileAbsolutePath} dataSource=fds  rootEntity=true 
transformer=TemplateTransformer
  field column=link 
template=hello${f.fileAbsolutePath},${jc.rawLine} /
  field column=rawLine name=rawLine /
 /entity
/entity
/document
/dataConfig
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

1 2 3 4 5 >

1 - 100 of 418 matches

Mail list logo