Hi Karl, unfortunately i'm busy in this day, but i try to test and let you known.
Thanks, L. Alciata 2016-05-09 18:04 GMT+02:00 Karl Wright <[email protected]>: > Hi Luca, > > I've put together code that should allow multivalued attributes to be > crawled. In order to try it, you will need to check out the > CONNECTORS-1313 branch: > > svn checkout > https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1313 > > Then, build: > > ant make-core-deps > ant build > > Please give this a try and see if it works for you. > > Thanks, > Karl > > > On Fri, May 6, 2016 at 10:15 AM, Luca Alicata <[email protected]> > wrote: > >> Hi Karl, >> I can confirm that it is a little expensive, but at that time, i haven't >> much time, and i stop to work after found the solution. >> Thanks for the creation of the ticket, for the moment, i try to use >> generic connector. >> >> An other question, there is another connector that can use an application >> to receive data? Like GenericConnector? >> >> Thanks, >> L. Alicata >> >> 2016-05-06 16:02 GMT+02:00 Karl Wright <[email protected]>: >> >>> Hi Luca, >>> >>> This approach causes each document's binary data to be read more than >>> once. I think that is expensive, especially if there are a lot of values. >>> for a row. >>> >>> Instead I think something more like ACLs will be needed -- that is, a >>> separate query for each multi-valued field. This is more work but it would >>> work much better. >>> >>> I will create a ticket to add this to the JDBC connector, but it won't >>> happen for a while. >>> >>> Karl >>> >>> >>> On Fri, May 6, 2016 at 9:40 AM, Luca Alicata <[email protected]> >>> wrote: >>> >>>> I've decompile java connector and modified the code in this way: >>>> >>>> in process document, i see that just currently arrive all row of query >>>> result (also multi values row), but in the cycle that parse document, after >>>> first document with an ID, all the other with the same are skipped. >>>> So i removed the control that not permits to check other document with >>>> the same ID and i modified the method that store metadata, to permit to >>>> store multi value data as array in metadata mapping. >>>> >>>> I attached the code in this e-mail. You can find a comment that start >>>> with "---", that i insert know for you. >>>> >>>> Thanks, >>>> L. Alicata >>>> >>>> 2016-05-06 15:25 GMT+02:00 Karl Wright <[email protected]>: >>>> >>>>> Ok, it's now clear what you are looking for, but it is still not clear >>>>> how we'd integrate that in the JDBC connector. How did you do this when >>>>> you modified the connector for 1.8? >>>>> >>>>> Karl >>>>> >>>>> >>>>> On Fri, May 6, 2016 at 9:21 AM, Luca Alicata <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Karl, >>>>>> sorry for my english :). >>>>>> I mean the fact that i've to extract value from query with a join >>>>>> between two table with a relationship of one-to-many, the dataset >>>>>> returned >>>>>> from Connector is only one pair from the two table. >>>>>> >>>>>> For example: >>>>>> Table A with persons >>>>>> Table B with eyes >>>>>> >>>>>> As result of join, i aspect have two row like: >>>>>> person 1, eye left >>>>>> person 1, eye right >>>>>> >>>>>> but the connector returns only one row: >>>>>> person 1, eye left >>>>>> >>>>>> I hope now it's more clear. >>>>>> >>>>>> Ps. i report the phrase on Manifold documentation that explain that ( >>>>>> https://manifoldcf.apache.org/release/release-2.3/en_US/end-user-documentation.html#jdbcrepository >>>>>> ): >>>>>> ------ >>>>>> There is currently no support in the JDBC connection type for >>>>>> natively handling multi-valued metadata. >>>>>> ------ >>>>>> >>>>>> Thanks, >>>>>> L. Alicata >>>>>> >>>>>> >>>>>> 2016-05-06 15:10 GMT+02:00 Karl Wright <[email protected]>: >>>>>> >>>>>>> Hi Luca, >>>>>>> >>>>>>> It is not clear what you mean by "multi value extraction" using the >>>>>>> JDBC connector. The JDBC connector allows collection of primary binary >>>>>>> content as well as metadata from a database row. So maybe if you can >>>>>>> explain what you need beyond that it would help. >>>>>>> >>>>>>> Thanks, >>>>>>> Karl >>>>>>> >>>>>>> >>>>>>> On Fri, May 6, 2016 at 9:04 AM, Luca Alicata <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Karl, >>>>>>>> thanks for information, fortunately in other jboss instance i have >>>>>>>> a old Manifold configuration with single process, that i've dismissed. >>>>>>>> But >>>>>>>> in this moment, i start to test this jobs with that and if it work >>>>>>>> fine, i >>>>>>>> can use it only for this job and use it also in production. Maybe >>>>>>>> after, if >>>>>>>> i can, i try to check the possible problem that stop the agent. >>>>>>>> >>>>>>>> I Take advantage of this discussion to ask you, if multi-value >>>>>>>> extraction from db is consider as possible future work or no. Because >>>>>>>> i've >>>>>>>> used this generi connector to resolve this lack of JDBC Connector. In >>>>>>>> fact >>>>>>>> with Manifold 1.8 i've modified the connector to support this behavior >>>>>>>> (in >>>>>>>> addiction to parse blob file), but upgrade Manifold Version, to not >>>>>>>> rewrite >>>>>>>> the new connector i decide to use Generic Connector with application >>>>>>>> that >>>>>>>> do the work of extraction data from DB. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> L. Alicata >>>>>>>> >>>>>>>> 2016-05-06 14:42 GMT+02:00 Karl Wright <[email protected]>: >>>>>>>> >>>>>>>>> Hi Luca, >>>>>>>>> >>>>>>>>> If you do a lock clean and the process still stops, then the locks >>>>>>>>> are not the problem. >>>>>>>>> >>>>>>>>> One way we can drill down into the problem is to get a thread dump >>>>>>>>> of the agents process after it stops. The thread dump must be of the >>>>>>>>> agents process, not any of the others. >>>>>>>>> >>>>>>>>> FWIW, the generic connector is not well supported; the person who >>>>>>>>> wrote it is still a committer but is not actively involved in MCF >>>>>>>>> development at this time. I suspect that the problem may have to do >>>>>>>>> with >>>>>>>>> how that connector deals with exceptions or errors, but I am not sure. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Karl >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, May 6, 2016 at 8:38 AM, Luca Alicata < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi Karl, >>>>>>>>>> I've just tried with lock-clean after agents stop to work, >>>>>>>>>> obviously after stopping process. After this, job start correctly, >>>>>>>>>> but just >>>>>>>>>> second time that i start a job with a lot of data (or sometimes the >>>>>>>>>> third >>>>>>>>>> time), agent stop again. >>>>>>>>>> >>>>>>>>>> Unfortunately, it's difficult start, for the moment, to using >>>>>>>>>> Zookeeper in this environment, but this can resolve the fact that >>>>>>>>>> during >>>>>>>>>> working agents stop to work? or help only for cleaning lock agent >>>>>>>>>> when i >>>>>>>>>> restart the process? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> L. Alicata >>>>>>>>>> >>>>>>>>>> 2016-05-06 14:15 GMT+02:00 Karl Wright <[email protected]>: >>>>>>>>>> >>>>>>>>>>> Hi Luca, >>>>>>>>>>> >>>>>>>>>>> With file-based synchronization, if you kill any of the >>>>>>>>>>> processes involved, you will need to execute the lock-clean >>>>>>>>>>> procedure to >>>>>>>>>>> make sure you have no dangling locks in the file system. >>>>>>>>>>> >>>>>>>>>>> - shut down all MCF processes (except the database) >>>>>>>>>>> - run the lock-clean script >>>>>>>>>>> - start your MCF processes back up >>>>>>>>>>> >>>>>>>>>>> I suspect what you are seeing is related to this. >>>>>>>>>>> >>>>>>>>>>> Also, please consider using Zookeeper instead, since it is more >>>>>>>>>>> robust about cleaning out dangling locks. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Karl >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, May 6, 2016 at 8:06 AM, Luca Alicata < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Karl, >>>>>>>>>>>> thanks for help. >>>>>>>>>>>> In my case i've only one instance of MCF running, with both >>>>>>>>>>>> type of job (SP and Generic), and so i have only one properties >>>>>>>>>>>> files (that >>>>>>>>>>>> i have attached). >>>>>>>>>>>> For information i used (multiprocess-file configuration) with >>>>>>>>>>>> postgres. >>>>>>>>>>>> >>>>>>>>>>>> Do you have other suggestions? do you need more information, >>>>>>>>>>>> that i can give you? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> L.Alicata >>>>>>>>>>>> >>>>>>>>>>>> 2016-05-06 12:55 GMT+02:00 Karl Wright <[email protected]>: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Luca, >>>>>>>>>>>>> >>>>>>>>>>>>> Do you have multiple independent MCF clusters running at the >>>>>>>>>>>>> same time? It sounds like you do: you have SP on one, and >>>>>>>>>>>>> Generic on >>>>>>>>>>>>> another. If so, you will need to be sure that the >>>>>>>>>>>>> synchronization you are >>>>>>>>>>>>> using (either zookeeper or file-based) does not overlap. Each >>>>>>>>>>>>> cluster >>>>>>>>>>>>> needs its own synchronization. If there is overlap, then doing >>>>>>>>>>>>> things with >>>>>>>>>>>>> one cluster may cause the other cluster to hang. This also means >>>>>>>>>>>>> you have >>>>>>>>>>>>> to have different properties files for the two clusters, of >>>>>>>>>>>>> course. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Karl >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, May 6, 2016 at 4:32 AM, Luca Alicata < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> i'm using Manifold 2.2 with multi-process configuration in >>>>>>>>>>>>>> Jboss instance inside a Windows Server 2012 and i've a set of >>>>>>>>>>>>>> job that work >>>>>>>>>>>>>> with Sharepoint (SP) or Generic Connector (GC), that get file >>>>>>>>>>>>>> from a db. >>>>>>>>>>>>>> With SP i've no problem, while with GC with a lot of document >>>>>>>>>>>>>> (one with 47k and another with 60k), the Seed taking process, >>>>>>>>>>>>>> sometimes, >>>>>>>>>>>>>> not finish, because the agents seem to stop (although java >>>>>>>>>>>>>> process is still >>>>>>>>>>>>>> alive). >>>>>>>>>>>>>> After this, if i try to start any other job, that not start, >>>>>>>>>>>>>> like the agents are stopped. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Other times, this jobs work correctly and one time together >>>>>>>>>>>>>> work correctly, running in the same moment. >>>>>>>>>>>>>> >>>>>>>>>>>>>> For information: >>>>>>>>>>>>>> >>>>>>>>>>>>>> - On Jboss there are only Manifold and Generic Repository >>>>>>>>>>>>>> application. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> - On the same Virtual Server, there is another Jboss >>>>>>>>>>>>>> istance, with solr istance and a web application. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> - I've check if it was a type of memory problem, but it's >>>>>>>>>>>>>> not the case. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> - GC with almost 23k seed work always, at least in test >>>>>>>>>>>>>> that i've done. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> - In local instance of Jboss with Manifold and Generic >>>>>>>>>>>>>> Rpository Application, i've not keep this problem. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is the only recurrent information that i've seen on >>>>>>>>>>>>>> manifold.log: >>>>>>>>>>>>>> --------------- >>>>>>>>>>>>>> Connection 0.0.0.0:62755<-><ip-address>:<port> shut down >>>>>>>>>>>>>> Releasing connection >>>>>>>>>>>>>> org.apache.http.impl.conn.ManagedClientConnectionImpl@6c98c1bd >>>>>>>>>>>>>> >>>>>>>>>>>>>> --------------- >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> L. Alicata >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
