Re: Job with Generic Connector stop to work

Luca Alicata Fri, 06 May 2016 06:21:54 -0700

Hi Karl,
sorry for my english :).
I mean the fact that i've to extract value from query with a join between
two table with a relationship of one-to-many, the dataset returned from
Connector is only one pair from the two table.


For example:
Table A with persons
Table B with eyes

As result of join, i aspect have two row like:
person 1, eye left
person 1, eye right

but the connector returns only one row:
person 1, eye left

I hope now it's more clear.

Ps. i report the phrase on Manifold documentation that explain that (
https://manifoldcf.apache.org/release/release-2.3/en_US/end-user-documentation.html#jdbcrepository
):
------
There is currently no support in the JDBC connection type for natively
handling multi-valued metadata.
------

Thanks,
L. Alicata


2016-05-06 15:10 GMT+02:00 Karl Wright <[email protected]>:

> Hi Luca,
>
> It is not clear what you mean by "multi value extraction" using the JDBC
> connector.  The JDBC connector allows collection of primary binary content
> as well as metadata from a database row.  So maybe if you can explain what
> you need beyond that it would help.
>
> Thanks,
> Karl
>
>
> On Fri, May 6, 2016 at 9:04 AM, Luca Alicata <[email protected]>
> wrote:
>
>> Hi Karl,
>> thanks for information, fortunately in other jboss instance i have a old
>> Manifold configuration with single process, that i've dismissed. But in
>> this moment, i start to test this jobs with that and if it work fine, i can
>> use it only for this job and use it also in production. Maybe after, if i
>> can, i try to check the possible problem that stop the agent.
>>
>> I Take advantage of this discussion to ask you, if multi-value extraction
>> from db is consider as possible future work or no. Because i've used this
>> generi connector to resolve this lack of JDBC Connector. In fact with
>> Manifold 1.8 i've modified the connector to support this behavior (in
>> addiction to parse blob file), but upgrade Manifold Version, to not rewrite
>> the new connector i decide to use Generic Connector with application that
>> do the work of extraction data from DB.
>>
>> Thanks,
>> L. Alicata
>>
>> 2016-05-06 14:42 GMT+02:00 Karl Wright <[email protected]>:
>>
>>> Hi Luca,
>>>
>>> If you do a lock clean and the process still stops, then the locks are
>>> not the problem.
>>>
>>> One way we can drill down into the problem is to get a thread dump of
>>> the agents process after it stops.  The thread dump must be of the agents
>>> process, not any of the others.
>>>
>>> FWIW, the generic connector is not well supported; the person who wrote
>>> it is still a committer but is not actively involved in MCF development at
>>> this time.  I suspect that the problem may have to do with how that
>>> connector deals with exceptions or errors, but I am not sure.
>>>
>>> Thanks,
>>>
>>> Karl
>>>
>>>
>>> On Fri, May 6, 2016 at 8:38 AM, Luca Alicata <[email protected]>
>>> wrote:
>>>
>>>> Hi Karl,
>>>> I've just tried with lock-clean after agents stop to work, obviously
>>>> after stopping process. After this, job start correctly, but just second
>>>> time that i start a job with a lot of data (or sometimes the third time),
>>>> agent stop again.
>>>>
>>>> Unfortunately, it's difficult start, for the moment, to using Zookeeper
>>>> in this environment, but this can resolve the fact that during working
>>>> agents stop to work? or help only for cleaning lock agent when i restart
>>>> the process?
>>>>
>>>> Thanks,
>>>> L. Alicata
>>>>
>>>> 2016-05-06 14:15 GMT+02:00 Karl Wright <[email protected]>:
>>>>
>>>>> Hi Luca,
>>>>>
>>>>> With file-based synchronization, if you kill any of the processes
>>>>> involved, you will need to execute the lock-clean procedure to make sure
>>>>> you have no dangling locks in the file system.
>>>>>
>>>>> - shut down all MCF processes (except the database)
>>>>> - run the lock-clean script
>>>>> - start your MCF processes back up
>>>>>
>>>>> I suspect what you are seeing is related to this.
>>>>>
>>>>> Also, please consider using Zookeeper instead, since it is more robust
>>>>> about cleaning out dangling locks.
>>>>>
>>>>> Thanks,
>>>>> Karl
>>>>>
>>>>>
>>>>> On Fri, May 6, 2016 at 8:06 AM, Luca Alicata <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Karl,
>>>>>> thanks for help.
>>>>>> In my case i've only one instance of MCF running, with both type of
>>>>>> job (SP and Generic), and so i have only one properties files (that i 
>>>>>> have
>>>>>> attached).
>>>>>> For information i used (multiprocess-file configuration) with
>>>>>> postgres.
>>>>>>
>>>>>> Do you have other suggestions? do you need more information, that i
>>>>>> can give you?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> L.Alicata
>>>>>>
>>>>>> 2016-05-06 12:55 GMT+02:00 Karl Wright <[email protected]>:
>>>>>>
>>>>>>> Hi Luca,
>>>>>>>
>>>>>>> Do you have multiple independent MCF clusters running at the same
>>>>>>> time?  It sounds like you do: you have SP on one, and Generic on 
>>>>>>> another.
>>>>>>> If so, you will need to be sure that the synchronization you are using
>>>>>>> (either zookeeper or file-based) does not overlap.  Each cluster needs 
>>>>>>> its
>>>>>>> own synchronization.  If there is overlap, then doing things with one
>>>>>>> cluster may cause the other cluster to hang.  This also means you have 
>>>>>>> to
>>>>>>> have different properties files for the two clusters, of course.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, May 6, 2016 at 4:32 AM, Luca Alicata <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>> i'm using Manifold 2.2 with multi-process configuration in Jboss
>>>>>>>> instance inside a Windows Server 2012 and i've a set of job that work 
>>>>>>>> with
>>>>>>>> Sharepoint (SP) or Generic Connector (GC), that get file from a db.
>>>>>>>> With SP i've no problem, while with GC with a lot of document (one
>>>>>>>> with 47k and another with 60k), the Seed taking process, sometimes, not
>>>>>>>> finish, because the agents seem to stop (although java process is still
>>>>>>>> alive).
>>>>>>>> After this, if i try to start any other job, that not start, like
>>>>>>>> the agents are stopped.
>>>>>>>>
>>>>>>>> Other times, this jobs work correctly and one time together work
>>>>>>>> correctly, running in the same moment.
>>>>>>>>
>>>>>>>> For information:
>>>>>>>>
>>>>>>>>    - On Jboss there are only Manifold and Generic Repository
>>>>>>>>    application.
>>>>>>>>
>>>>>>>>
>>>>>>>>    - On the same Virtual Server, there is another Jboss istance,
>>>>>>>>    with solr istance and a web application.
>>>>>>>>
>>>>>>>>
>>>>>>>>    - I've check if it was a type of memory problem, but it's not
>>>>>>>>    the case.
>>>>>>>>
>>>>>>>>
>>>>>>>>    - GC with almost 23k seed work always, at least in test that
>>>>>>>>    i've done.
>>>>>>>>
>>>>>>>>
>>>>>>>>    - In local instance of Jboss with Manifold and Generic
>>>>>>>>    Rpository Application, i've not keep this problem.
>>>>>>>>
>>>>>>>> This is the only recurrent information that i've seen on
>>>>>>>> manifold.log:
>>>>>>>> ---------------
>>>>>>>> Connection 0.0.0.0:62755<-><ip-address>:<port> shut down
>>>>>>>> Releasing connection
>>>>>>>> org.apache.http.impl.conn.ManagedClientConnectionImpl@6c98c1bd
>>>>>>>>
>>>>>>>> ---------------
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> L. Alicata
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Job with Generic Connector stop to work

Reply via email to