Yes, this a only single file. Thanks Rafael Mendes.
> On 13 Feb 2022, at 07:13, Rafael Mendes <rafaelpir...@gmail.com> wrote: > > Hi, Danilo. > Do you have a single large file, only? > If so, I guess you can use tools like sed/awk to split it into more files > based on layout, so you can read these files into Spark. > > > Em qua, 9 de fev de 2022 09:30, Bitfox <bit...@bitfox.top> escreveu: > Hi > > I am not sure about the total situation. > But if you want a scala integration I think it could use regex to match and > capture the keywords. > Here I wrote one you can modify by your end. > > import scala.io.Source > import scala.collection.mutable.ArrayBuffer > > val list1 = ArrayBuffer[(String,String,String)]() > val list2 = ArrayBuffer[(String,String)]() > > > val patt1 = """^(.*)#(.*)#([^#]*)$""".r > val patt2 = """^(.*)#([^#]*)$""".r > > val file = "1.txt" > val lines = Source.fromFile(file).getLines() > > for ( x <- lines ) { > x match { > case patt1(k,v,z) => list1 += ((k,v,z)) > case patt2(k,v) => list2 += ((k,v)) > case _ => println("no match") > } > } > > > Now the list1 and list2 have the elements you wanted, you can convert them to > a dataframe easily. > > Thanks. > > On Wed, Feb 9, 2022 at 7:20 PM Danilo Sousa <danilosousa...@gmail.com > <mailto:danilosousa...@gmail.com>> wrote: > Hello > > > Yes, for this block I can open as csv with # delimiter, but have the block > that is no csv format. > > This is the likely key value. > > We have two different layouts in the same file. This is the “problem”. > > Thanks for your time. > > > >> Relação de Beneficiários Ativos e Excluídos >> Carteira em#27/12/2019##Todos os Beneficiários >> Operadora#AMIL >> Filial#SÃO PAULO#Unidade#Guarulhos >> >> Contrato#123456 - Test >> Empresa#Test > >> On 9 Feb 2022, at 00:58, Bitfox <bit...@bitfox.top >> <mailto:bit...@bitfox.top>> wrote: >> >> Hello >> >> You can treat it as a csf file and load it from spark: >> >> >>> df = spark.read.format("csv").option("inferSchema", >> >>> "true").option("header", "true").option("sep","#").load(csv_file) >> >>> df.show() >> +--------------------+-------------------+-----------------+ >> | Plano|Código Beneficiário|Nome Beneficiário| >> +--------------------+-------------------+-----------------+ >> |58693 - NACIONAL ...| 65751353| Jose Silva| >> |58693 - NACIONAL ...| 65751388| Joana Silva| >> |58693 - NACIONAL ...| 65751353| Felipe Silva| >> |58693 - NACIONAL ...| 65751388| Julia Silva| >> +--------------------+-------------------+-----------------+ >> >> >> cat csv_file: >> >> Plano#Código Beneficiário#Nome Beneficiário >> 58693 - NACIONAL R COPART PJCE#065751353#Jose Silva >> 58693 - NACIONAL R COPART PJCE#065751388#Joana Silva >> 58693 - NACIONAL R COPART PJCE#065751353#Felipe Silva >> 58693 - NACIONAL R COPART PJCE#065751388#Julia Silva >> >> >> Regards >> >> >> On Wed, Feb 9, 2022 at 12:50 AM Danilo Sousa <danilosousa...@gmail.com >> <mailto:danilosousa...@gmail.com>> wrote: >> Hi >> I have to transform unstructured text to dataframe. >> Could anyone please help with Scala code ? >> >> Dataframe need as: >> >> operadora filial unidade contrato empresa plano codigo_beneficiario >> nome_beneficiario >> >> Relação de Beneficiários Ativos e Excluídos >> Carteira em#27/12/2019##Todos os Beneficiários >> Operadora#AMIL >> Filial#SÃO PAULO#Unidade#Guarulhos >> >> Contrato#123456 - Test >> Empresa#Test >> Plano#Código Beneficiário#Nome Beneficiário >> 58693 - NACIONAL R COPART PJCE#073930312#Joao Silva >> 58693 - NACIONAL R COPART PJCE#073930313#Maria Silva >> >> Contrato#898011000 - FUNDACAO GERDAU >> Empresa#FUNDACAO GERDAU >> Plano#Código Beneficiário#Nome Beneficiário >> 58693 - NACIONAL R COPART PJCE#065751353#Jose Silva >> 58693 - NACIONAL R COPART PJCE#065751388#Joana Silva >> 58693 - NACIONAL R COPART PJCE#065751353#Felipe Silva >> 58693 - NACIONAL R COPART PJCE#065751388#Julia Silva >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> <mailto:user-unsubscr...@spark.apache.org> >> >