Hi I am not sure about the total situation. But if you want a scala integration I think it could use regex to match and capture the keywords. Here I wrote one you can modify by your end.
import scala.io.Source import scala.collection.mutable.ArrayBuffer val list1 = ArrayBuffer[(String,String,String)]() val list2 = ArrayBuffer[(String,String)]() val patt1 = """^(.*)#(.*)#([^#]*)$""".r val patt2 = """^(.*)#([^#]*)$""".r val file = "1.txt" val lines = Source.fromFile(file).getLines() for ( x <- lines ) { x match { case patt1(k,v,z) => list1 += ((k,v,z)) case patt2(k,v) => list2 += ((k,v)) case _ => println("no match") } } Now the list1 and list2 have the elements you wanted, you can convert them to a dataframe easily. Thanks. On Wed, Feb 9, 2022 at 7:20 PM Danilo Sousa <danilosousa...@gmail.com> wrote: > Hello > > > Yes, for this block I can open as csv with # delimiter, but have the block > that is no csv format. > > This is the likely key value. > > We have two different layouts in the same file. This is the “problem”. > > Thanks for your time. > > > > Relação de Beneficiários Ativos e Excluídos >> Carteira em#27/12/2019##Todos os Beneficiários >> Operadora#AMIL >> Filial#SÃO PAULO#Unidade#Guarulhos >> >> Contrato#123456 - Test >> Empresa#Test > > > On 9 Feb 2022, at 00:58, Bitfox <bit...@bitfox.top> wrote: > > Hello > > You can treat it as a csf file and load it from spark: > > >>> df = spark.read.format("csv").option("inferSchema", > "true").option("header", "true").option("sep","#").load(csv_file) > >>> df.show() > +--------------------+-------------------+-----------------+ > | Plano|Código Beneficiário|Nome Beneficiário| > +--------------------+-------------------+-----------------+ > |58693 - NACIONAL ...| 65751353| Jose Silva| > |58693 - NACIONAL ...| 65751388| Joana Silva| > |58693 - NACIONAL ...| 65751353| Felipe Silva| > |58693 - NACIONAL ...| 65751388| Julia Silva| > +--------------------+-------------------+-----------------+ > > > cat csv_file: > > Plano#Código Beneficiário#Nome Beneficiário > 58693 - NACIONAL R COPART PJCE#065751353#Jose Silva > 58693 - NACIONAL R COPART PJCE#065751388#Joana Silva > 58693 - NACIONAL R COPART PJCE#065751353#Felipe Silva > > 58693 - NACIONAL R COPART PJCE#065751388#Julia Silva > > > Regards > > > On Wed, Feb 9, 2022 at 12:50 AM Danilo Sousa <danilosousa...@gmail.com> > wrote: > >> Hi >> I have to transform unstructured text to dataframe. >> Could anyone please help with Scala code ? >> >> Dataframe need as: >> >> operadora filial unidade contrato empresa plano codigo_beneficiario >> nome_beneficiario >> >> Relação de Beneficiários Ativos e Excluídos >> Carteira em#27/12/2019##Todos os Beneficiários >> Operadora#AMIL >> Filial#SÃO PAULO#Unidade#Guarulhos >> >> Contrato#123456 - Test >> Empresa#Test >> Plano#Código Beneficiário#Nome Beneficiário >> 58693 - NACIONAL R COPART PJCE#073930312#Joao Silva >> 58693 - NACIONAL R COPART PJCE#073930313#Maria Silva >> >> Contrato#898011000 - FUNDACAO GERDAU >> Empresa#FUNDACAO GERDAU >> Plano#Código Beneficiário#Nome Beneficiário >> 58693 - NACIONAL R COPART PJCE#065751353#Jose Silva >> 58693 - NACIONAL R COPART PJCE#065751388#Joana Silva >> 58693 - NACIONAL R COPART PJCE#065751353#Felipe Silva >> 58693 - NACIONAL R COPART PJCE#065751388#Julia Silva >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> >